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Chapter 1 


Introduction to Probability Theory 


1.1 The Binomial Asset Pricing Model 


The binomial asset pricing model provides a powerful tool to understand arbitrage pricing theory 
and probability theory. In this course, we shall use it for both these purposes. 


In the binomial asset pricing model, we model stock prices in discrete time, assuming that at each 
step, the stock price will change to one of two possible values. Let us begin with an initial positive 
stock price So. There are two positive numbers, d and u, with 


0O<d<u, (1.1) 


such that at the next period, the stock price will be either d'So or uSo. Typically, we take d and u 
to satisfy 0 < d < 1 < u, so change of the stock price from So to dSo represents a downward 
movement, and change of the stock price from So to wSg represents an upward movement. It is 
common to also have d = L, and this will be the case in many of our examples. However, strictly 
speaking, for what we are about to do we need to assume only (1.1) and (1.2) below. 


Of course, stock price movements are much more complicated than indicated by the binomial asset 
pricing model. We consider this simple model for three reasons. First of all, within this model the 
concept of arbitrage pricing and its relation to risk-neutral pricing is clearly illuminated. Secondly, 
the model is used in practice because with a sufficient number of steps, it provides a good, compu- 
tationally tractable approximation to continuous-time models. Thirdly, within the binomial model 
we can develop the theory of conditional expectations and martingales which lies at the heart of 
continuous-time models. 


With this third motivation in mind, we develop notation for the binomial model which is a bit 
different from that normally found in practice. Let us imagine that we are tossing a coin, and when 
we get a “Head,” the stock price moves up, but when we get a “Tail,” the price moves down. We 
denote the price at time 1 by 51 (H) = uSy if the toss results in head (H), and by Sı (T) = dSo if it 
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Figure 1.1: Binomial tree of stock prices with Sy = 4, u= 1/d= 2. 


results in tail (T). After the second toss, the price will be one of: 


S2(HH) = us (H) = u? So, S9(HT) = dS\(H) = dudo, 


So (TH) = usi (T) = udSo, So (TT) = dSy (T) = d So. 


After three tosses, there are eight possible coin sequences, although not all of them result in different 
stock prices at time 3. 


For the moment, let us assume that the third toss is the last one and denote by 
Q=4{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


the set of all possible outcomes of the three tosses. The set Q of all possible outcomes of a ran- 
dom experiment is called the sample space for the experiment, and the elements w of Q are called 
sample points. In this case, each sample point w is a sequence of length three. We denote the k-th 
component of w by wg. For example, when w = HTH, we have wı = H, w2 = T and w3 = H. 


The stock price S% at time k depends on the coin tosses. To emphasize this, we often write S% (w). 
Actually, this notation does not quite tell the whole story, for while 53 depends on all of w, So 
depends on only the first two components of w, Sı depends on only the first component of w, and 
So does not depend on w at all. Sometimes we will use notation such S2(w1, w2) just to record more 
explicitly how 52 depends on w = (w1, w2, w3). 

Example 1.1 Set Sy = 4, u = 2 and d = >. We have then the binomial “tree” of possible stock 
prices shown in Fig. 1.1. Each sample point w = (w1, w2, w3) represents a path through the tree. 
Thus, we can think of the sample space Q as either the set of all possible outcomes from three coin 
tosses or as the set of all possible paths through the tree. 


To complete our binomial asset pricing model, we introduce a money market with interest rate r; 
$1 invested in the money market becomes $(1 + r) in the next period. We take r to be the interest 
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rate for both borrowing and lending. (This is not as ridiculous as it first seems, because in a many 
applications of the model, an agent is either borrowing or lending (not both) and knows in advance 
which she will be doing; in such an application, she should take r to be the rate of interest for her 
activity.) We assume that 


d<l+r<u. (1.2) 


The model would not make sense if we did not have this condition. For example, if 1+ r > u, then 
the rate of return on the money market is always at least as great as and sometimes greater than the 
return on the stock, and no one would invest in the stock. The inequality d > 1 + r cannot happen 
unless either r is negative (which never happens, except maybe once upon a time in Switzerland) or 
d > 1. In the latter case, the stock does not really go “down” if we get a tail; it just goes up less 
than if we had gotten a head. One should borrow money at interest rate r and invest in the stock, 
since even in the worst case, the stock price rises at least as fast as the debt used to buy it. 


With the stock as the underlying asset, let us consider a European call option with strike price 
K > 0 and expiration time 1. This option confers the right to buy the stock at time 1 for K dollars, 
and so is worth Sı — K at time 1 if 51 — K is positive and is otherwise worth zero. We denote by 


Vi(w) = (S1(w) — K)* Ê max£S1 (w) - K,0} 


the value (payoff) of this option at expiration. Of course, Vi (w) actually depends only on w1, and 
we can and do sometimes write V; (w1) rather than Vj (w). Our first task is to compute the arbitrage 
price of this option at time zero. 


Suppose at time zero you sell the call for Vp dollars, where Vo is still to be determined. You now 
have an obligation to pay off (uSy — K)t if wı = H and to pay off (dS) — K)* if wı = T. At 
the time you sell the option, you don’t yet know which value w; will take. You hedge your short 
position in the option by buying Ay shares of stock, where Ag is still to be determined. You can use 
the proceeds Vo of the sale of the option for this purpose, and then borrow if necessary at interest 
rate r to complete the purchase. If Vo is more than necessary to buy the Ag shares of stock, you 
invest the residual money at interest rate r. In either case, you will have Vo — Ao So dollars invested 
in the money market, where this quantity might be negative. You will also own Ay shares of stock. 


If the stock goes up, the value of your portfolio (excluding the short position in the option) is 
AoSi(H) + (1+ r)(Vo — AoSo), 
and you need to have V;(#). Thus, you want to choose Vo and Ag so that 
Vi(H) = AoSi(H) + (1+ 7) (Vo — AoSo)- (1.3) 
If the stock goes down, the value of your portfolio is 
Aos (T) + (1+ 7) (Vo — AoSo), 
and you need to have V; (T). Thus, you want to choose Vo and Ao to also have 


Vi(T) = AoS1 (T) + (1 + r) (Vo TE Ao So). (1.4) 
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These are two equations in two unknowns, and we solve them below 


Subtracting (1.4) from (1.3), we obtain 
V (A) - VT) = Ao(S1(11) — Si(T)), (1.5) 


so that 


_ Vi(H)-V(T) 


AOS Si) — S(T)’ 


(1.6) 
This is a discrete-time version of the famous “delta-hedging” formula for derivative securities, ac- 
cording to which the number of shares of an underlying asset a hedge should hold is the derivative 
(in the sense of calculus) of the value of the derivative security with respect to the price of the 
underlying asset. This formula is so pervasive the when a practitioner says “delta”, she means the 
derivative (in the sense of calculus) just described. Note, however, that my definition of Ag is the 
number of shares of stock one holds at time zero, and (1.6) is a consequence of this definition, not 
the definition of Aj itself. Depending on how uncertainty enters the model, there can be cases 
in which the number of shares of stock a hedge should hold is not the (calculus) derivative of the 
derivative security with respect to the price of the underlying asset. 


To complete the solution of (1.3) and (1.4), we substitute (1.6) into either (1.3) or (1.4) and solve 
for Vo. After some simplification, this leads to the formula 


= 1 fli+r-d u-(1+r) 
= E + Hy. (1.7) 


Vo 
This is the arbitrage price for the European call option with payoff V; at time 1. To simplify this 
formula, we define 


=1-p, (1.8) 
so that (1.7) becomes 
1 
= — |pV\(H MD) |. 1. 
Lo BE On! (1.9) 


Because we have taken d < u, both p and q are defined,i.e., the denominator in (1.8) is not zero. 
Because of (1.2), both p and q are in the interval (0, 1), and because they sum to 1, we can regard 
them as probabilities of H and 7’, respectively. They are the risk-neutral probabilites. They ap- 
peared when we solved the two equations (1.3) and (1.4), and have nothing to do with the actual 
probabilities of getting H or T on the coin tosses. In fact, at this point, they are nothing more than 
a convenient tool for writing (1.7) as (1.9). 


We now consider a European call which pays off K dollars at time 2. At expiration, the payoff of 


this option is Va â (S2 — K)”, where V2 and 53 depend on w, and wy, the first and second coin 
tosses. We want to determine the arbitrage price for this option at time zero. Suppose an agent sells 
the option at time zero for Vo dollars, where Vo is still to be determined. She then buys Ay shares 
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of stock, investing Va — AoSo dollars in the money market to finance this. At time 1, the agent has 
a portfolio (excluding the short position in the option) valued at 


Xi 2 AoSi + (1+ r)(Vo — A050). (1.10) 


Although we do not indicate it in the notation, 51 and therefore X depend on w1, the outcome of 
the first coin toss. Thus, there are really two equations implicit in (1.10): 


IPs 


X1 (11) 
X1(T) 


AoS1 (H) + (1 + r) (Vo = Ao So), 
AoSi(F) + (1+ r)(Vo — Ao So). 


IPs 


After the first coin toss, the agent has X; dollars and can readjust her hedge. Suppose she decides to 
now hold A, shares of stock, where A, is allowed to depend on w: because the agent knows what 
value w; has taken. She invests the remainder of her wealth, X; — AS; in the money market. In 
the next period, her wealth will be given by the right-hand side of the following equation, and she 
wants it to be V2. Therefore, she wants to have 


Va = ArS2+ (1+r)(X, — A1 61). (1.11) 


Although we do not indicate it in the notation, 52 and V2 depend on w: and wz, the outcomes of the 
first two coin tosses. Considering all four possible outcomes, we can write (1.11) as four equations: 


V(HH) = AlH)S(HH)+(1+r)(X1(H) —- A (#)S1(H)), 
ViA(HT) = Ai(H)Sa(HT)+ (14 r)(X1(H) — A (#)S1(H)), 
V(TH) = ADS (TD) + (01+ TNA - ADSL), 
VATD) = A(T)S(TT)+ (1+ r)(Xı(T) - A DS D). 


We now have six equations, the two represented by (1.10) and the four represented by (1.11), in the 
six unknowns Vo, Ao, Ai (H), Ai (TL), Xı (H), and X, (T). 


To solve these equations, and thereby determine the arbitrage price Vo at time zero of the option and 
the hedging portfolio Ay, Ai (H) and A; (T), we begin with the last two 


VA(TH) = AT)SATH)+(14+r) (4 (7) - Ar(T)S1(D)), 
V(TT) = A(DSAATD + (1 +r)(Xı(T) - A DST). 


Subtracting one of these from the other and solving for A(T), we obtain the “delta-hedging for- 
mula” 


(1.12) 
and substituting this into either equation, we can solve for 


X(T) = lover) + @V,(TT)]. (1.13) 
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Equation (1.13), gives the value the hedging portfolio should have at time 1 if the stock goes down 
between times 0 and 1. We define this quantity to be the arbitrage value of the option at time 1 if 
41 = T, and we denote it by V¡ (T). We have just shown that 


A 


YD) li (TH) + @Vi(TT)). (1.14) 


The hedger should choose her portfolio so that her wealth X: (T) if 2, = T agrees with V¡ (T) 
defined by (1.14). This formula is analgous to formula (1.9), but postponed by one step. The first 
two equations implicit in (1.11) lead in a similar way to the formulas 


Aa A (1.15) 

and Xı(H) = V¡ (4), where Vi (H) is the value of the option at time 1 if w; = H, defined by 
v, (1) Ê Ad) + GV2(HT)). (1.16) 
This is again analgous to formula (1.9), postponed by one step. Finally, we plug the values X, (H) = 
V,(H) and X,(P) = Vi (T) into the two equations implicit in (1.10). The solution of these equa- 


tions for Ag and Vo is the same as the solution of (1.3) and (1.4), and results again in (1.6) and 
(1.9). 


The pattern emerging here persists, regardless of the number of periods. If V;, denotes the value at 


time k of a derivative security, and this depends on the first k coin tosses w1, .. ., wk, then at time 
k — 1, after the first k — 1 tosses w,,...,w,_1 are known, the portfolio to hedge a short position 
should hold A;,-1 (w1, ...,«%—1) shares of stock, where 
V o. 01) H) — V; o. Wk- T 
NE T E S plor, -awki H) — Velor, wk- 1) (1.17) 


Si (wr, ee 5 Whe idl) ae Si (wi, ae owk- T) 


and the value at time k — 1 of the derivative security, when the first k — 1 coin tosses result in the 
outcomes w1, ...,Wk—1, 1S given by 


1 E a 
Meir) Irala, 6+, 0-1) H) + GVelcw0r, ...,W-1,7)] 


(1.18) 
1.2 Finite Probability Spaces 
Let 2 be a set with finitely many elements. An example to keep in mind is 
Q={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} (2.1) 


of all possible outcomes of three coin tosses. Let F be the set of all subsets of Q. Some sets in F 
are 0, { HHH, HHT, HTH, HTT}, {TTT}, and Q itself. How many sets are there in F? 
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Definition 1.1 A probability measure IP is a function mapping F into [0, 1] with the following 
properties: 


© P(Q) =1, 


(ii) If A1, 42,... is a sequence of disjoint sets in F, then 
P (U 4) = Y P(A;). 


Probability measures have the following interpretation. Let A be a subset of F. Imagine that 2 is 
the set of all possible outcomes of some random experiment. There is a certain probability, between 
0 and 1, that when that experiment is performed, the outcome will lie in the set 4. We think of 
IP(A) as this probability. 


Example 1.2 Suppose a coin has probability 4 for H and 2 for T. For the individual elements of 
Q in (2.1), define 


For A € F, we define 


P(A) = Y Phu). (2.2) 


For example, 


IP{ HHH, HHT, HTH, HTT} = ek +2 EN (5) + (=) aj = > 


which is another way of saying that the probability of H on the first toss is E 


As in the above example, it is generally the case that we specify a probability measure on only some 
of the subsets of 2 and then use property (ii) of Definition 1.1 to determine IP (A) for the remaining 
sets A € F. In the above example, we specified the probability measure only for the sets containing 
a single element, and then used Definition 1.1(ii) in the form (2.2) (see Problem 1.4(ii)) to determine 
JP for all the other sets in F. 


Definition 1.2 Let Q be a nonempty set. A o-algebra is a collection G of subsets of Q with the 
following three properties: 


(1) DEG, 
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(ii) If A € G, then its complement A‘ € G, 
(iii) If Ay, A2, A3,... is a sequence of sets in G, then UP? , Az is also in G. 


Here are some important o-algebras of subsets of the set 2 in Example 1.2: 


Fo = faol, 


Fi fo Q,{HHH, HHT, HTH, HTT}, {THH, THT, TTH, rrr}, 


Fa 


fo O, {HHH, HHT}, {HTH, HTT}, {THH, THT}, {TTH, TTT}, 


and all sets which can be built by taking unions of hesel, 
F> = F = The set of all subsets of 2. 


To simplify notation a bit, let us define 


Ag Ê {HHH, HHT, HTH, HTT} = 4H on the first toss}, 
Ar 4 {THH,THT,TTH,TTT} = {T on the first toss}, 
so that 
Fi = {0,Q, Ap, Ap}, 


and let us define 


Any £ {HHH, HHT} = {HH on the first two tosses}, 
Agr Ê {HTH, HTT} = {HT on the first two tosses}, 
Ary Ê {THH,THT}= {TH on the first two tosses}, 
Arr & {TTH,TTT} = {TT on the first two tosses}, 


so that 


Fy = {0,0, AHH, Apr, Aru, Arr, 
Ap, Ar, AHH U Ara, Anu U Art, Ant U Ara, Ant U ATT, 


c c c c 
AHH, AHT» ATH, ATT} 


We interpret c-algebras as a record of information. Suppose the coin is tossed three times, and you 
are not told the outcome, but you are told, for every set in 7, whether or not the outcome is in that 
set. For example, you would be told that the outcome is not in 9 and is in Q. Moreover, you might 
be told that the outcome is not in Ap but is in 47. In effect, you have been told that the first toss 
was a 7’, and nothing more. The c-algebra F; is said to contain the “information of the first toss”, 
which is usually called the “information up to time 1”. Similarly, F2 contains the “information of 
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the first two tosses,” which is the “information up to time 2.” The o-algebra 73 = F contains “full 
information” about the outcome of all three tosses. The so-called “trivial” c-algebra Fo contains no 
information. Knowing whether the outcome w of the three tosses is in () (it is not) and whether it is 
in (2 (it is) tells you nothing about w 


Definition 1.3 Let 2 be a nonempty finite set. A filtrationis a sequence of o-algebras Fo, F1, F2,... 


such that each o-algebra in the sequence contains all the sets contained by the previous o-algebra. 


Definition 1.4 Let Q be a nonempty finite set and let F be the c-algebra of all subsets of Q. A 
random variable is a function mapping 2 into JR. 


Example 1.3 Let 2 be given by (2.1) and consider the binomial asset pricing Example 1.1, where 
So = 4,u = 2 and d = L, Then So, S1, S2 and S3 are all random variables. For example, 
S2(H HT) = u? So = 16. The “random variable” So is really not random, since So(w) = 4 for all 
w € Q. Nonetheless, it is a function mapping (2 into JR, and thus technically a random variable, 


albeit a degenerate one. 


A random variable maps 2 into JR, and we can look at the preimage under the random variable of 
sets in JR. Consider, for example, the random variable 52 of Example 1.1. We have 


S:(HHH) = S:(HHT) = 16, 
Sa(HTH) = S2(HTT) = S:(THH) = $)(THT) = 4, 
Sa(TTH) = $:(TTT) = 1. 


Let us consider the interval [4, 27]. The preimage under S% of this interval is defined to be 
{w € Q; Sa(w) € [4, 27]} = {w E€ Q;4 < S2 < 27} = Arr- 
The complete list of subsets of Q we can get as preimages of sets in JR is: 
0,2, Ann, Ant U Ara, ArT, 


and sets which can be built by taking unions of these. This collection of sets is a c-algebra, called 
the a-algebra generated by the random variable S2, and is denoted by o(.S2). The information 
content of this o-algebra is exactly the information learned by observing 52. More specifically, 
suppose the coin is tossed three times and you do not know the outcome w, but someone is willing 
to tell you, for each set in o(.S2), whether w is in the set. You might be told, for example, that w is 
notin Agp, is in Apr U Ary, and is not in Arr. Then you know that in the first two tosses, there 
was a head and a tail, and you know nothing more. This information is the same you would have 
gotten by being told that the value of S2(w) is 4. 


Note that F> defined earlier contains all the sets which are in o(.S2), and even more. This means 
that the information in the first two tosses is greater than the information in 53. In particular, if you 
see the first two tosses, you can distinguish Apy from Arp, but you cannot make this distinction 
from knowing the value of 5 alone. 
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Definition 1.5 Let Q be a nonemtpy finite set and let F be the o-algebra of all subsets of Q. Let X 
be a random variable on (Q, F). The o-algebra o (X) generated by X is defined to be the collection 
of all sets of the form {w € Q; X (w) € A}, where A is a subset of JR. Let G be a sub-0-algebra of 
F. We say that X is G-measurable if every set in o (X) is also in G. 


Note: We normally write simply {X € A} rather than {w € Q; X (w) € A}. 


Definition 1.6 Let (2 be a nonempty, finite set, let F be the o-algebra of all subsets of Q, let IP be 
a probabilty measure on (Q, F), and let X be a random variable on 2. Given any set A C IR, we 
define the induced measure of A to be 


Ly (A) Ê P{X € A}. 


In other words, the induced measure of a set A tells us the probability that X takes a value in A. In 
the case of Sz above with the probability measure of Example 1.2, some sets in JR and their induced 
measures are: 


Ls,(0) = PO) =0, 
Ls, (R) = P(Q) = 1, 
£5,[0,00) = P(Q) = 1, 


PE ey 


2 
In fact, the induced measure of S places a mass of size (4) = > at the number 16, a mass of size 


2 
4 at the number 4, and a mass of size (2) = 4 at the number 1. A common way to record this 


information is to give the cumulative distribution function Fs, (x) of S2, defined by 


if <1, 
if1 <z <4, 
if4<a< 16, 
if 16 < z. 


Fs, (x) 2 P(S < 2) = (2.3) 


= olvol © 


By the distribution of a random variable X, we mean any of the several ways of characterizing 
Lx. If X is discrete, as in the case of S2 above, we can either tell where the masses are and how 
large they are, or tell what the cumulative distribution function is. (Later we will consider random 
variables X which have densities, in which case the induced measure of a set A C JR is the integral 
of the density over the set A.) 


Important Note. In order to work through the concept of a risk-neutral measure, we set up the 
definitions to make a clear distinction between random variables and their distributions. 


A random variable is a mapping from 22 to JR, nothing more. It has an existence quite apart from 


discussion of probabilities. For example, in the discussion above, S2(TTH) = SA(TTT) = 1, 


regardless of whether the probability for H is > or >. 
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The distribution of a random variable is a measure £ x on JR, i.e., a way of assigning probabilities 
to sets in JR. It depends on the random variable X and the probability measure /P we use in Q. If we 
set the probability of H to be 4, then Ls, assigns mass 4 to the number 16. If we set the probability 
of H to be >, then £s, assigns mass + to the number 16. The distribution of Sz has changed, but 
the random variable has not. It is still defined by 


S.(HHH) = S:(HHT) = 16, 
SA(HTH) = S)(HTT) = S(THH)=S(THT)=4, 
Sa(TTH) = $:(TTT) = 1. 


Thus, a random variable can have more than one distribution (a “market” or “objective” distribution, 
and a “risk-neutral” distribution). 


In a similar vein, two different random variables can have the same distribution. Suppose in the 
binomial model of Example 1.1, the probability of H and the probability of T is >. Consider a 
European call with strike price 14 expiring at time 2. The payoff of the call at time 2 is the random 
variable (S3 — 14)*, which takes the value 2 if w = H H H orw = H HT, and takes the value 0 in 
every other case. The probability the payoff is 2 is L, and the probability it is zero is 3. Consider also 
a European put with strike price 3 expiring at time 2. The payoff of the put at time 2 is (3 — S2)*, 
which takes the value 2 if w = TTH orw = TTT. Like the payoff of the call, the payoff of the 
put is 2 with probability + and 0 with probability 3. The payoffs of the call and the put are different 


random variables having the same distribution. 


Definition 1.7 Let Q be a nonempty, finite set, let F be the o-algebra of all subsets of Q, let IP be 
a probabilty measure on (Q, F), and let X be a random variable on Q. The expected value of X is 
defined to be 


EX È Y X(v) Plo}. (2.4) 
wEQ 


Notice that the expected value in (2.4) is defined to be a sum over the sample space Ñ. Since 2 is a 
finite set, X can take only finitely many values, which we label z1, ..., &n. We can partition Q into 
the subsets [Xy = z1}, ..., {Xn = 2, }, and then rewrite (2.4) as 
EX È Y X(w)P(o) 
wen 


<> ISS Bo 


k=1 wE{Xk =k} 


= a 5 P{w} 


k=1 WELX,=" 4) 


= y pIP{X;, = £k} 


k=1 
= 5 LkLx{Ek}. 
k=1 
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Thus, although the expected value is defined as a sum over the sample space (2, we can also write it 
as a sum over JR. 


To make the above set of equations absolutely clear, we consider 52 with the distribution given by 
(2.3). The definition of JES is 


IES, = So(HHH)IP{HHH}+ S:(HHT)P{HHT} 
+S)(HTH)IP{HTH}4+ S.(HTT)P{HTT} 
+52(THH)IP{THH}4+ So(THT)P{THT} 
+52(TTH)P{TTH} + So(TTT)P{TTT} 

= 16-P(Aqy)+4-P(AnrU Ary) 4+1-P(Arr) 
= 16-P{S, = 16}+4-P{S,=4}41-P{S, = 1} 
= 16-L5,{16}+4-Le {4$+1-L5,{1} 


= elpa dia 
= 9 9 9 
o 48 
9 


Definition 1.8 Let Q be a nonempty, finite set, let F be the o -algebra of all subsets of Q, let IP be a 
probabilty measure on (Q, F), and let X be a random variable on (2. The variance of X is defined 
to be the expected value of (X — JEX ?, i.e 


Var(X) £ Y (X (0) - EX) Pf). (2.5) 
wen 


One again, we can rewrite (2.5) as a sum over JR rather than over 2. Indeed, if X takes the values 
Zi; +++, Zn, then 


= Y (2; - EXPPLX = t} = Y (es — EX Ly (ax). 
k=1 k=1 


1.3 Lebesgue Measure and the Lebesgue Integral 


In this section, we consider the set of real numbers JR, which is uncountably infinite. We define the 
Lebesgue measure of intervals in JR to be their length. This definition and the properties of measure 
determine the Lebesgue measure of many, but not all, subsets of JR. The collection of subsets of 
JR we consider, and for which Lebesgue measure is defined, is the collection of Borel sets defined 
below. 


We use Lebesgue measure to construct the Lebesgue integral, a generalization of the Riemann 
integral. We need this integral because, unlike the Riemann integral, it can be defined on abstract 
spaces, such as the space of infinite sequences of coin tosses or the space of paths of Brownian 
motion. This section concerns the Lebesgue integral on the space JR only; the generalization to 
other spaces will be given later. 
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Definition 1.9 The Borel o-algebra, denoted B(JR), is the smallest c-algebra containing all open 
intervals in JR. The sets in B(JR) are called Borel sets. 


Every set which can be written down and just about every set imaginable is in B(JR). The following 
discussion of this fact uses the c-algebra properties developed in Problem 1.3. 


By definition, every open interval (a, b) is in BUR), where a and b are real numbers. Since B(JR) is 
a o-algebra, every union of open intervals is also in B(/R). For example, for every real number a, 
the open half-line 


(a, 00) = Wee 


is a Borel set, as is 


For real numbers a and b, the union 
(—oo, a) U (b, 00) 
is Borel. Since B(JR) is a o-algebra, every complement of a Borel set is Borel, so B(/R) contains 
[a,b] = ((—00, a) U (b,00)) 


This shows that every closed interval is Borel. In addition, the closed half-lines 
[a, o) = [J la, a+ n] 


and 


are Borel. Half-open and half-closed intervals are also Borel, since they can be written as intersec- 
tions of open half-lines and closed half-lines. For example, 


(a, b] = (-0o,b]N (a, œ). 


Every set which contains only one real number is Borel. Indeed, if a is a real number, then 


This means that every set containing finitely many real numbers is Borel; if A = (a,,43,..., an}, 
then 


A= U {ag}. 


k=1 
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In fact, every set containing countably infinitely many numbers is Borel; if A = {a1, a2,...}, then 


n 
A= U {ag}. 
k=1 
This means that the set of rational numbers is Borel, as is its complement, the set of irrational 


numbers. 


There are, however, sets which are not Borel. We have just seen that any non-Borel set must have 
uncountably many points. 


Example 1.4 (The Cantor set.) This example gives a hint of how complicated a Borel set can be. 
We use it later when we discuss the sample space for an infinite sequence of coin tosses. 


Consider the unit interval [0, 1], and remove the middle half, i.e., remove the open interval 
13 
AS G 3) ! 
44 


a- bilo 


has two pieces. From each of these pieces, remove the middle half, i.e., remove the open set 
afl 3 13 15 
A, =(—,— ean ee 
: (15) U (5515) 
1 3 1 3 13 15 
= T Y eas —,1|. 
C2 5 Uli aU Gauls 


has four pieces. Continue this process, so at stage k, the set Cy, has 2° pieces, and each piece has 
length — The Cantor set 


The remaining set 


The remaining set 


A CO 
Gear 
k=1 
is defined to be the set of points not removed at any stage of this nonterminating process. 
Note that the length of Aj, the first set removed, is L The “length” of Az, the second set removed, 
is $+ t = 4. The “length” of the next set removed is 4 - 2 = É, and in general, the length of the 
k-th set removed is 27*. Thus, the total length removed is 
2 
ok? 
k=1 2 
and so the Cantor set, the set of points not removed, has zero “length.” 


Despite the fact that the Cantor set has no “length,” there are lots of points in this set. In particular, 
none of the endpoints of the pieces of the sets C1, Ca, ... is ever removed. Thus, the points 
1 3 1 3 13 15 1 
0 1, —,—,—,—,— 


are all in C. This is a countably infinite set of points. We shall see eventually that the Cantor set 
has uncountably many points. o 
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Definition 1.10 Let B(/R) be the c-algebra of Borel subsets of IR. A measure on (IR, B(IR)) is a 
function u mapping B into [0, co] with the following properties: 


(i) (0) = 9, 


(ii) If Ay, Ag,... is a sequence of disjoint sets in B(JR), then 
H (U 4) = Y > u(Az). 
Lebesgue measure is defined to be the measure on (JR, B(JR)) which assigns the measure of each 


interval to be its length. Following Williams’s book, we denote Lebesgue measure by uo. 


A measure has all the properties of a probability measure given in Problem 1.4, except that the total 
measure of the space is not necessarily 1 (in fact, 9 (1R) = 00), one no longer has the equation 


w(A°) = 1 = (A) 
in Problem 1.4(111), and property (v) in Problem 1.4 needs to be modified to say: 


(v) If Ai, Ao,... is a sequence of sets in BUR) with 41 > Ag D --- and (A) < oo, then 
p (ñ 4) = lim (An): 
k=1 
To see that the additional requirment (41) < oo is needed in (v), consider 
Ay = [de 00), Ag = (2, 00), Az = (3, 00), e. 


Then M2, Ák = 0, so wo(NZZ, Ak) = 0, but limno HolAn) = 00. 


We specify that the Lebesgue measure of each interval is its length, and that determines the Lebesgue 
measure of all other Borel sets. For example, the Lebesgue measure of the Cantor set in Example 
1.4 must be zero, because of the “length” computation given at the end of that example. 


The Lebesgue measure of a set containing only one point must be zero. In fact, since 
1 1 
E pa = 
{a} ¢ (a-—,0+=) 
for every positive integer n, we must have 
1 1 2 
0 < pola) < po (ah a+ 7) == 
n n n 


Letting n — ov, we obtain 
Hota} =0. 
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The Lebesgue measure of a set containing countably many points must also be zero. Indeed, if 
A= Lar, a2,.-. $} then 

pol A) = Y mofa} = 550 =0. 


The Lebesgue measure of a set containing uncountably many points can be either zero, positive and 
finite, or infinite. We may not compute the Lebesgue measure of an uncountable set by adding up 
the Lebesgue measure of its individual members, because there is no way to add up uncountably 
many numbers. The integral was invented to get around this problem. 


In order to think about Lebesgue integrals, we must first consider the functions to be integrated. 


Definition 1.11 Let f be a function from /R to IR. We say that f is Borel-measurable if the set 
{z € IR; f(x) € A} is in BUR) whenever A € B(IR). In the language of Section 2, we want the 
o-algebra generated by f to be contained in B(UJR). 


Definition 3.4 is purely technical and has nothing to do with keeping track of information. It is 
difficult to conceive of a function which is not Borel-measurable, and we shall pretend such func- 
tions don’t exist. Hencefore, “function mapping JR to IR” will mean “Borel-measurable function 
mapping JR to JR” and “subset of JR” will mean “Borel subset of IR”. 


Definition 1.12 An indicator function g from JR to JR is a function which takes only the values 0 
and 1. We call 
AÊ {x € Rige) =1) 


the set indicated by g. We define the Lebesgue integral of g to be 
A 
f g duo = po(A). 
R 
A simple function h from JR to IR is a linear combination of indicators, i.e., a function of the form 
h(x) = = cg (2), 
k=1 
where each g; is of the form 


(2) = 1, ifx € Ax, 
EE ins ife g Ar, 


and each c+ is areal number. We define the Lebesgue integral of h to be 
A n n 
f h duo = 5 cr f gx do = y Cr Mo (A). 
R k=1 R k=1 


Let f be a nonnegative function defined on JR, possibly taking the value oo at some points. We 
define the Lebesgue integral of f to be 


f f duo 2 sup r h do; h is simple and h(x) < f(x) for every x € n} : 
R R 


CHAPTER 1. Introduction to Probability Theory 27 


It is possible that this integral is infinite. If it is finite, we say that f is integrable. 


Finally, let f be a function defined on JR, possibly taking the value oo at some points and the value 
—oo at other points. We define the positive and negative parts of f to be 


fte) Ê max{ f(x), 0}, flo) £ max{-— f(z), 0}, 


respectively, and we define the Lebesgue integral of f to be 


[fa ® f Fdw- = f F duo: 


provided the right-hand side is not of the form oo — oo. If both fp ft duo and fpg fT dpo are finite 
(or equivalently, fp |f| duo < œ, since |f| = ft + f7), we say that f is integrable. 


Let f be a function defined on JR, possibly taking the value oo at some points and the value —oo at 
other points. Let A be a subset of IR. We define 


J feu f Ef duo, 
A R 


Aa J 1, ifv €A, 
tato) 3 0, if2¢ A, 


where 


is the indicator function of A. 


The Lebesgue integral just defined is related to the Riemann integral in one very important way: if 
the Riemann integral f a f(x)dz is defined, then the Lebesgue integral Sia] f duo agrees with the 
Riemann integral. The Lebesgue integral has two important advantages over the Riemann integral. 
The first is that the Lebesgue integral is defined for more functions, as we show in the following 
examples. 


Example 1.5 Let Q be the set of rational numbers in [0, 1], and consider f 2 Ig. Being a countable 
set, Q has Lebesgue measure zero, and so the Lebesgue integral of f over [0, 1] is 


| fam=0. 

[0,1] 

To compute the Riemann integral de f(u)dx, we choose partition points 0 = a < zı < = < 
£n = 1 and divide the interval [0, 1] into subintervals [xo, 21], 21, £2]; -< -, [En—-1; £n]. In each 


subinterval [2 , 1, 11] there is a rational point qx, where f (qg) = 1, and there is also an irrational 
point rg, where f(r) = 0. We approximate the Riemann integral from above by the upper sum 


Ms 


Fa) (Ek — tk-1) = 5 L- (xk — gpa) = 1, 
k=1 


x 
II 
= 


and we also approximate it from below by the lower sum 


WE 
Ms 


Fri) (er — 241) = 2,0: (Ek — 24-1) = 0. 


> 
II 
= 


k 


Il 
a 
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No matter how fine we take the partition of [0, 1], the upper sum is always 1 and the lower sum is 
always 0. Since these two do not converge to a common value as the partition becomes finer, the 
Riemann integral is not defined. © 


Example 1.6 Consider the function 


AJ o, ifx=0, 
dE 


This is not a simple function because simple function cannot take the value oo. Every simple 
function which lies between 0 and f is of the form 


a J y, 10, 
BE ek 


for some y € [0, 00), and thus has Lebesgue integral 


f h duo = yuo{0} = 0. 
R 
It follows that 


f f duo = sup [y h duo; h is simple and h(x) < f(x) for every x € n} = 
R R 


Now consider the Riemann integral f° f(a) de, which for this function f is the same as the 
Riemann integral ft; f(x) de. When we partition [—1, 1] into subintervals, one of these will contain 


the point 0, and when we compute the upper approximating sum for fi f(a) de, this point will 
contribute oo times the length of the subinterval containing it. Thus the upper approximating sum is 
oo. On the other hand, the lower approximating sum is 0, and again the Riemann integral does not 
exist. © 


The Lebesgue integral has all linearity and comparison properties one would expect of an integral. 
In particular, for any two functions f and g and any real constant c, 


[EF +9) duo = [fot f gdm; 
[cfd = ef Fano 


and whenever f(x) < g(x) for all x € IR, we have 


f duos f odamo. 
R R 


Finally, if A and B are disjoint sets, then 


f timos | dno f Faro. 
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There are three convergence theorems satisfied by the Lebesgue integral. In each of these the sit- 
uation is that there is a sequence of functions f,,,n = 1,2,... converging pointwise to a limiting 
function f. Pointwise convergence just means that 

li = for ever ; 

Nim fate) = f(x) yreR 
There are no such theorems for the Riemann integral, because the Riemann integral of the limit- 
ing function f is too often not defined. Before we state the theorems, we given two examples of 
pointwise convergence which arise in probability theory. 


Example 1.7 Consider a sequence of normal densities, each with variance 1 and the n-th having 
mean n: 


These converge pointwise to the function 
f(a) = 0 for every x € IR. 
We have fr fdo = 1 for every n, solimn+co fp fado = 1, but fp f duo = 0. © 


Example 1.8 Consider a sequence of normal densities, each with mean 0 and the n-th having vari- 


ance l 7 
n 


These converge pointwise to the function 


AJ œ, ifx=0, 
1012) 0, ifa 40. 


We have again fp fado = 1 for every n, so limno Jp fado = 1, but fp f duo = 0. The 
function f is not the Dirac delta; the Lebesgue integral of this function was already seen in Example 


1.6 to be zero. o 


Theorem 3.1 (Fatou’s Lemma) Let f,,,n = 1,2,... be a sequence of nonnegative functions con- 
verging pointwise to a function f. Then 


f f duo < lim int f fn dho. 
R OO R 


If lim»-+00 fp fn do is defined, then Fatou’s Lemma has the simpler conclusion 


f ramos lim f fn dpo. 
R ROO R 


This is the case in Examples 1.7 and 1.8, where 


Jim, [fo duo = 1, 
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while fp f duo = 0. We could modify either Example 1.7 or 1.8 by setting gn = fn if n is even, 
but ga = 2f, if n is odd. Now [ip 9, duo = 1 if n is even, but fp gn duo = 2 if n is odd. The 
sequence { fp Jn duo), has two cluster points, 1 and 2. By definition, the smaller one, 1, is 
lim inf,>0 fp In dto and the larger one, 2, is lim sup,,_,0 Jp Jn do. Fatou’s Lemma guarantees 
that even the smaller cluster point will be greater than or equal to the integral of the limiting function. 


The key assumption in Fatou’s Lemma is that all the functions take only nonnegative values. Fatou’s 
Lemma does not assume much but it is is not very satisfying because it does not conclude that 


f ft = tm f fu dy: 
R TL OO R 
There are two sets of assumptions which permit this stronger conclusion. 


Theorem 3.2 (Monotone Convergence Theorem) Let fn, n = 1,2,... be a sequence of functions 
converging pointwise to a function f. Assume that 


0 < file) < fole) < fale) € => for every x € R. 


Then 
f fdo= tm f fa dpo, 
R WFG R 


where both sides are allowed to be œ. 


Theorem 3.3 (Dominated Convergence Theorem) Let fn, n = 1,2, ... be a sequence of functions, 
which may take either positive or negative values, converging pointwise to a function f. Assume 
that there is a nonnegative integrable function g (i.e., fp g dto < 00) such that 


|fn(a)| < g(x) for every x € IR for every n. 


Then 
f fatto = tm f fu dy 
R 1 0O R 
and both sides will be finite. 


1.4 General Probability Spaces 


Definition 1.13 A probability space (Q, F , IP) consists of three objects: 
(i) Q, a nonempty set, called the sample space, which contains all possible outcomes of some 
random experiment; 


(ii) F, a 0-algebra of subsets of (2; 


(iii) IP, a probability measure on (Q, F), i.e., a function which assigns to each set A € F a number 
IP(A) € [0,1], which represents the probability that the outcome of the random experiment 
lies in the set A. 
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Remark 1.1 We recall from Homework Problem 1.4 that a probability measure /P has the following 
properties: 


(a) IP(0) = 0. 


(b) (Countable additivity) If A1, A2, . . . is a sequence of disjoint sets in F, then 
P (U 4) = Y P(As). 


(c) (Finite additivity) If n is a positive integer and A,,..., An are disjoint sets in F, then 


IP(A1U---UA,) = P(A) +: Pln). 


(d) If A and B are sets in F and A C B, then 
IP(B) = P(A) + P(B\ A). 


In particular, 
IP(B) > P(A). 


(d) (Continuity from below.) If 41, A2,... is a sequence of sets in F with Ay C Ap C---, then 
P (U 4) = lim P(A). 
k=1 
(d) (Continuity from above.) If 41, A2,... is a sequence of sets in F with Ay D Ap D---, then 


P (ñ 4) = lim P(A). 


k=1 


We have already seen some examples of finite probability spaces. We repeat these and give some 
examples of infinite probability spaces as well. 


Example 1.9 Finite coin toss space. 

Toss a coin n times, so that Q is the set of all sequences of H and T which have n components. 
We will use this space quite a bit, and so give it a name: 22,,. Let F be the collection of all subsets 
of Q,,. Suppose the probability of H on each toss is p, a number between zero and one. Then the 


probability of T is q Se p. For each w = (w1, W2,...,Wn) in Qn, we define 


Pio} A gp umber of H inw, goals of Tin w 
For each A € F, we define 


P(A) 2 Y P{o}. (4.1) 


WEA 


We can define P(A) this way because A has only finitely many elements, and so only finitely many 
terms appear in the sum on the right-hand side of (4.1). © 
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Example 1.10 Infinite coin toss space. 

Toss a coin repeatedly without stopping, so that 2 is the set of all nonterminating sequences of H 
and T. We call this space Qo. This is an uncountably infinite space, and we need to exercise some 
care in the construction of the o-algebra we will use here. 


For each positive integer n, we define .F,, to be the d-algebra determined by the first n tosses. For 
example, F> contains four basic sets, 


A 

Agg = {w= (w1,W2,w3,...);w1 = H,w = H} 
= The set of all sequences which begin with H H, 
A 

Apr = {w= (w1,W2,3,...)3@1 = H,w2 =T} 
= The set of all sequences which begin with HT, 
A 

Arg = {w= (w1,W2,W3,...)3o1, =T,w2 = H} 
= The set of all sequences which begin with TH, 
A 

Arr = {w= (01,003, .) 01 =T,w2 =T} 


The set of all sequences which begin with TT. 
Because F; is a a-algebra, we must also put into it the sets @, Q, and all unions of the four basic 
sets. 


In the c-algebra F, we put every set in every o-algebra Fn, where n ranges over the positive 
integers. We also put in every other set which is required to make F be a o-algebra. For example, 
the set containing the single sequence 


{HHHHH..-} = {H on every toss} 


is not in any of the F, d-algebras, because it depends on all the components of the sequence and 
not just the first n components. However, for each positive integer n, the set 


{H on the first n tosses} 


is in F,, and hence in F. Therefore, 


{H on every toss} = N {H on the first n tosses} 
n=l 
is also in F. 


We next construct the probability measure IP on (Q.,, F) which corresponds to probability p € 
[0, 1] for H and probability q = 1 — p for T. Let A € F be given. If there is a positive integer n 
such that A € F,,, then the description of A depends on only the first n tosses, and it is clear how to 
define P(A). For example, suppose A = Ay y U Aru, where these sets were defined earlier. Then 
A is in Fz. We set P(Ayp) = p° and IP(Ar 1) = qp, and then we have 


IP(A) = IP(Apg U Aru) =p +qp= (p+ q)p = p. 


In other words, the probability of a H on the second toss is p. 
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Let us now consider a set A € F for which there is no positive integer n such that A € F. Such 
is the case for the set {H on every toss}. To determine the probability of these sets, we write them 
in terms of sets which are in F, for positive integers n, and then use the properties of probability 
measures listed in Remark 1.1. For example, 


{H on the first toss} > {H on the first two tosses} 
> 4H on the first three tosses} 
2 Bee 
and 
N {H on the first n tosses} = {H on every toss}. 
n=l 


According to Remark 1.1(d) (continuity from above), 
IP{H on every toss} = lim IP{H on the first n tosses} = lim p”. 


If p = 1, then IP{H on every toss} = 1; otherwise, IP{ H on every toss} = 0. 


A similar argument shows that if 0 < p < 1 sothat0 < q < 1, then every set in Qa which contains 
only one element (nonterminating sequence of H and 7’) has probability zero, and hence very set 
which contains countably many elements also has probabiliy zero. We are in a case very similar to 
Lebesgue measure: every point has measure zero, but sets can have positive measure. Of course, 
the only sets which can have positive probabilty in Q. are those which contain uncountably many 
elements. 


In the infinite coin toss space, we define a sequence of random variables Y, Y2,... by 


afl ifw,=H, 
~) 0 ifu,=T, 


and we also define the random variable 


Xt) = He) 
k=1 


Since each Y; is either zero or one, X takes values in the interval [0, 1]. Indeed, X(TTTT ---) = 0, 
X(HHHH-..) = 1 and the other values of X lie in between. We define a “dyadic rational 
number” to be a number of the form 3;, where k and m are integers. For example, 2 is a dyadic 
rational. Every dyadic rational in (0,1) corresponds to two sequences w € Ra. For example, 


3 
X(HHTTTTT..-) = X(HTHHHHH.---) =7. 


The numbers in (0,1) which are not dyadic rationals correspond to a single w € Ra; these numbers 
have a unique binary expansion. 
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Whenever we place a probability measure JP on (Q, F), we have a corresponding induced measure 
£x on [0, 1]. For example, if we set p = q = 4 in the construction of this example, then we have 


1 1 
Ex fo. ;| = IP{First toss is T} = 7 
1 . , 1 
Lx 5 | = IP{First toss is H } = > 
1 ; 1 
Lx fo. 3| = IP{First two tosses are TT } = r 
11 1 
LX A 5| = IP{First two tosses are TH} = =, 
42 7 
13 : 1 
LX 5 ;| = IP{First two tosses are HT} = -, 
24 7 
3 ; 1 
Lx Ts lane IP{First two tosses are H H } = T 


Continuing this process, we can verify that for any positive integers k and m satisfying 


oir EL, 

we have 
m-l m 1 
Ex Po E. 


In other words, the £ x -measure of all intervals in [0, 1] whose endpoints are dyadic rationals is the 
same as the Lebesgue measure of these intervals. The only way this can be is for £ y to be Lebesgue 
measure. 


It is interesing to consider what £ x would look like if we take a value of p other than > when we 
construct the probability measure IP on Q. 


We conclude this example with another look at the Cantor set of Example 3.2. Let Qpairs be the 
subset of 2 in which every even-numbered toss is the same as the odd-numbered toss immediately 
preceding it. For example, H HTT TT H H is the beginning of a sequence in Qpairs, but HT is not. 
Consider now the set of real numbers 


CEI 


The numbers between (4, 3) can be written as X (w), but the sequence w must begin with either 


TH or HT. Therefore, none of these numbers is in C”. Similarly, the numbers between (5, =) 
can be written as X (w), but the sequence w must begin with T7'T'H or TT HT, so none of these 
numbers is in C”. Continuing this process, we see that C” will not contain any of the numbers which 
were removed in the construction of the Cantor set C in Example 3.2. In other words, C” € C. 
With a bit more work, one can convince onself that in fact C” = C, i.e., by requiring consecutive 
coin tosses to be paired, we are removing exactly those points in [0, 1] which were removed in the 


Cantor set construction of Example 3.2. o 
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In addition to tossing a coin, another common random experiment is to pick a number, perhaps 
using a random number generator. Here are some probability spaces which correspond to different 
ways of picking a number at random. 


Example 1.11 

Suppose we choose a number from /R in such a way that we are sure to get either 1, 4 or 16. 
Furthermore, we construct the experiment so that the probability of getting 1 is $, the probability of 
getting 4 is 4 and the probability of getting 16 is 3. We describe this random experiment by taking 
Q to be IR, F to be B(JR), and setting up the probability measure so that 


Piire A P{4} = > P{16} = > 


This determines /P(.A) for every set A € BUR). For example, the probability of the interval (0, 5] 
is S, because this interval contains the numbers 1 and 4, but not the number 16. 


The probability measure described in this example is £ s,, the measure induced by the stock price 
52, when the initial stock price Sg = 4 and the probability of H is 2. This distribution was discussed 
immediately following Definition 2.8. o 


Example 1.12 Uniform distribution on [0, 1]. 

Let Q = [0, 1] and let F = B([0, 1]), the collection of all Borel subsets containined in [0, 1]. For 
each Borel set A C [0, 1], we define IP(A) = o(A) to be the Lebesgue measure of the set. Because 
Ho[0, 1] = 1, this gives us a probability measure. 


This probability space corresponds to the random experiment of choosing a number from [0, 1] so 
that every number is “equally likely” to be chosen. Since there are infinitely mean numbers in [0, 1], 
this requires that every number have probabilty zero of being chosen. Nonetheless, we can speak of 
the probability that the number chosen lies in a particular set, and if the set has uncountably many 
points, then this probability can be positive. o 


I know of no way to design a physical experiment which corresponds to choosing a number at 
random from [0, 1] so that each number is equally likely to be chosen, just as I know of no way to 
toss a coin infinitely many times. Nonetheless, both Examples 1.10 and 1.12 provide probability 
spaces which are often useful approximations to reality. 


Example 1.13 Standard normal distribution. 
Define the standard normal density 


Ls) 


1 xv 
à A 


Let Q = IR, F = B(IR) and for every Borel set A C JR, define 


P(A) È f dro. (4.2) 
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If A in (4.2) is an interval [a, b], then we can write (4.2) as the less mysterious Riemann integral: 


q? 


b 1 == 
Pons] rs 2 de. 


This corresponds to choosing a point at random on the real line, and every single point has probabil- 
ity zero of being chosen, but if a set A is given, then the probability the point is in that set is given 
by (4.2). © 


The construction of the integral in a general probability space follows the same steps as the con- 
struction of Lebesgue integral. We repeat this construction below. 


Definition 1.14 Let (Q, F, IP) be a probability space, and let X be a random variable on this space, 
i.e., a mapping from 2 to IR, possibly also taking the values +00. 


e If X is an indicator, i.e, 


for some set A € F, we define 
f X dP Ê P(A). 
Q 


e If X is a simple function, i.e, 
X(w) = y cel a, (w), 
k=1 


where each cx is a real number and each A, is a set in F, we define 

f xap£2y cs f La, dP = Y cx IP(Ax). 

Q k=1 Q k=1 

e If X is nonnegative but otherwise general, we define 
I X dP 
a 
2 sup r Y dIP; Y is simple and Y (w) < X (w) for every w € 2) . 
a 


In fact, we can always construct a sequence of simple functions Y,,,n = 1, 2,... such that 
0 < Yi(w) < Yo(w) < Yalw) < ... for every w € Q, 


and Y (w) = limno Y, (w) for every w € (2. With this sequence, we can define 


| xara ii Jide 
Q NCO Q 
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e If X is integrable, i.e, 
[ XtdP<o, [| xcaP <o, 
2 2 


where 
X+(w) E max{X(w),0},  X7(w) Ê max{—X(w), 0), 


f xap [ xtap-- f x-ap. 
Q Q Q 


If A is a set in F and X is a random variable, we define 


then we define 


f xap [1 xar, 
A Q 


The expectation of a random variable X is defined to be 


ex? f xaP. 
Q 


The above integral has all the linearity and comparison properties one would expect. In particular, 
if X and Y are random variables and c is a real constant, then 


| + ae = f xir+ [ var, 
QR QR Q 


f exar = cf Xap, 
2 2 


If X (w) < Y (w) for every w € Q, then 


f xirs [var 
Q Q 


In fact, we don’t need to have X (w) < Y (w) for every w € Q in order to reach this conclusion; it is 
enough if the set of w for which X (w) < Y (w) has probability one. When a condition holds with 
probability one, we say it holds almost surely. Finally, if A and B are disjoint subsets of Q and X 
is a random variable, then 


f XdP= | XdP+ | Xap. 
AUB A B 


We restate the Lebesgue integral convergence theorem in this more general context. We acknowl- 
edge in these statements that conditions don’t need to hold for every w; almost surely is enough. 


Theorem 4.4 (Fatou’s Lemma) Let Xn, n = 1,2,... be a sequence of almost surely nonnegative 
random variables converging almost surely to a random variable X. Then 


f XdP< lim int f X, dP, 
Q n—>00 Q 


or equivalently, 
IEX < lim inf IE Xp. 
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Theorem 4.5 (Monotone Convergence Theorem) Let Xn, n = 1,2,... be a sequence of random 
variables converging almost surely to a random variable X. Assume that 


0< Xi < X2 < X <--- almost surely. 


Then 
f xar= lim J Xaar, 
Q NOOO Q 


IEX = lim EX. 
n—00 


or equivalently, 


Theorem 4.6 (Dominated Convergence Theorem) Let Xn, n = 1,2,... be a sequence of random 
variables, converging almost surely to a random variable X. Assume that there exists a random 
variable Y such that 

|X| < Y almost surely for every n. 


Then 
| xar= lim f xaar, 
Q N—>00 Q 


IEX = lim FX,,. 
n—00 


or equivalently, 


In Example 1.13, we constructed a probability measure on (JR, B(JR)) by integrating the standard 
normal density. In fact, whenever ¢ is a nonnegative function defined on R satisfying fp p dio = 1, 
we call y a density and we can define an associated probability measure by 


IP(A) = f edm for every A € BUR). (4.3) 


We shall often have a situation in which two measure are related by an equation like (4.3). In fact, 
the market measure and the risk-neutral measures in financial markets are related this way. We say 
that y in (4.3) is the Radon-Nikodym derivative of dIP with respect to po, and we write 


dP 


a 4.4 
(4.4) 


The probability measure JP weights different parts of the real line according to the density fp. Now 
suppose f is a function on (R, BUR), IP). Definition 1.14 gives us a value for the abstract integral 


Í fake. 


f fo duo, 
R 


which is an integral with respec to Lebesgue measure over the real line. We want to show that 


We can also evaluate 


[far = f fedmo (4.5) 
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an equation which is suggested by the notation introduced in (4.4) (substitute ef for ọ in (4.5) and 
“cancel” the duo). We include a proof of this because it allows us to illustrate the concept of the 
standard machine explained in Williams’s book in Section 5.12, page 5. 


The standard machine argument proceeds in four steps. 


Step 1. Assume that f is an indicator function, i.e., f(x) = (+) for some Borel set A C JR. In 
that case, (4.5) becomes 


IP(A) = J e dito. 
This is true because it is the definition of IP (A). 


Step 2. Now that we know that (4.5) holds when f is an indicator function, assume that f is a 
simple function, i.e., a linear combination of indicator functions. In other words, 


f(a) = 5 ch le), 
k=1 


where each c% is a real number and each hy is an indicator function. Then 


[rae = f Eo] aw 


k=1 


Io 
sus 
= = 

SS 
= = 
Eo > 
6 a, 
= y 
o 


k=1 
= Jh [oom p duo 
k=1 
= | sodio 


Step 3. Now that we know that (4.5) holds when f is a simple function, we consider a general 
nonnegative function f. We can always construct a sequence of nonnegative simple functions 
fan = 1,2,... such that 


0 < file) < falo) < filo) < ... for every x € R, 
and f(x) =lim,-+0 f. (1) for every x € IR. We have already proved that 


| par= | fn? duo for every n. 
R R 


We let n — oo and use the Monotone Convergence Theorem on both sides of this equality to 


get 
[far = f feduo. 
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Step 4. In the last step, we consider an integrable function f, which can take both positive and 
negative values. By integrable, we mean that 


[ fap <o, [ fap <c. 
R R 


¿From Step 3, we have 


[fae = | Podm 
[rae = | Fed. 


Subtracting these two equations, we obtain the desired result: 
[fap Š rape] rar 
R R 


To duo — f f p dpo 
R 


Il lI 
q 

~ 

€ = 

a 

= 

a) 


1.5 Independence 


In this section, we define and discuss the notion of independence in a general probability space 
(Q, F, IP), although most of the examples we give will be for coin toss space. 


1.5.1 Independence of sets 


Definition 1.15 We say that two sets A € F and B € F are independent if 
IP(AN B) = P(A)P(B). 


Suppose a random experiment is conducted, and w is the outcome. The probability that w € A is 
IP(A). Suppose you are not told w, but you are told that w € B. Conditional on this information, 
the probability that w € A is 

A P(ANB 

IP(A|B) = -PE 

The sets A and B are independent if and only if this conditional probability is the uncondidtional 
probability JP(A), i.e., knowing that w € B does not change the probability you assign to A. This 
discussion is symmetric with respect to A and B; if A and B are independent and you know that 
w E A, the conditional probability you assign to B is still the unconditional probability IP (B). 


Whether two sets are independent depends on the probability measure JP. For example, suppose we 
toss a coin twice, with probability p for H and probability q = 1 — p for T on each toss. To avoid 
trivialities, we assume that 0 < p < 1. Then 


P{HH} =p, P{HT} = P(TH) = pq, P{TT}=¢. (5.1) 


CHAPTER 1. Introduction to Probability Theory 41 


Let A= (HH, HT} and B = {HT,TH}. In words, A is the set “H on the first toss” and B is the 
set “one H and one T.” Then AN B = {HT}. We compute 


These sets are independent if and only if 2p?q = pq, which is the case if and only if p = >. 


If p = >, then JP(B), the probability of one head and one tail, is >. If you are told that the coin 
tosses resulted in a head on the first toss, the probability of B, which is now the probability of a T 
on the second toss, is still >. 


Suppose however that p = 0.01. By far the most likely outcome of the two coin tosses is TT”, and 
the probability of one head and one tail is quite small; in fact, IP(B) = 0.0198. However, if you 
are told that the first toss resulted in H, it becomes very likely that the two tosses result in one head 
and one tail. In fact, conditioned on getting a H on the first toss, the probability of one H and one 
T is the probability of a T on the second toss, which is 0.99. 


1.5.2 Independence of o-algebras 


Definition 1.16 Let G and H be sub-c-algebras of F. We say that G and H are independent if every 
set in G is independent of every set in H, i.e, 


IP(AN B) = P(A) P(B) for every ACH, BEG. 


Example 1.14 Toss a coin twice, and let IP be given by (5.1). Let G = Fı be the o-algebra 
determined by the first toss: G contains the sets 


0,0,{HH,HT},{TH,TT}. 
Let H be the c-albegra determined by the second toss: H contains the sets 
0,0,{HH,THS,{HT,TT}. 


These two o-algebras are independent. For example, if we choose the set {H H, HT} from G and 
the set {H H, TH} from HA, then we have 


IPL HH, HT} IP{HH,TH} = (p° + pq)(p? + pa) = p’, 
P((HH, HT} {HH,TH}) = P{HH} = p’. 


No matter which set we choose in G and which set we choose in H, we will find that the product of 
the probabilties is the probability of the intersection. 
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Example 1.14 illustrates the general principle that when the probability for a sequence of tosses is 
defined to be the product of the probabilities for the individual tosses of the sequence, then every 
set depending on a particular toss will be independent of every set depending on a different toss. 
We say that the different tosses are independent when we construct probabilities this way. It is also 
possible to construct probabilities such that the different tosses are not independent, as shown by 
the following example. 


Example 1.15 Define FP for the individual elements of Q = {H H, HT, TH, TT) to be 


1 

3? 

and for every set A C Q, define P(A) to be the sum of the probabilities of the elements in A. Then 
IP(Q) = 1, so P is a probability measure. Note that the sets { H on first toss} = {H H, HT } and 
{H on second toss} = (HH, TH) have probabilities P{HH,HT} = 4 and P{HH,TH} = 
4, so the product of the probabilities is + On the other hand, the intersection of {H H, HT} 
and {H H, TH} contains the single element {H H }, which has probability 4. These sets are not 
independent. 


P{HH} = 5 P{HT} = PITH} = > P{TT} = 


1.5.3 Independence of random variables 


Definition 1.17 We say that two random variables X and Y are independent if the c-algebras they 
generate o (X) and o (Y ) are independent. 


In the probability space of three independent coin tosses, the price 52 of the stock at time 2 is 
independent of ne This is because S depends on only the first two coin tosses, whereas oe is 
either u or d, depending on whether the third coin toss is H or T. 


Definition 1.17 says that for independent random variables X and Y, every set defined in terms of 
X is independent of every set defined in terms of Y. In the case of 52 and a just considered, for ex- 


ample, the sets {52 = udSo} = {HTH, HTT} and {$ = u} = {HH H, HTH, THH, TTH} 
are indepedent sets. 


Suppose X and Y are independent random variables. We defined earlier the measure induced by X 
on JR to be 
Lx (A) Ê P{X € A}, ACR. 


Similarly, the measure induced by Y is 
Ly (B) Ê P{Y € B}, BCR. 


Now the pair (X, Y ) takes values in the plane JR’, and we can define the measure induced by the 
pair 

Lxy (C) = PL(X, ¥) EG CE IR’. 
The set C in this last equation is a subset of the plane /R?. In particular, C could be a “rectangle”, 
i.e, a set of the form A x B, where A C IR and B C R. In this case, 


(XxX Y) € Ax B| ={X € A} N{Y €B}, 
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and X and Y are independent if and only if 


Lxy(Ax B) P({X € A}Nn {Y € B}) 

IP{X € A}IP{Y € B} (5.2) 
L£Lx(A)£y (B). 

In other words, for independent random variables X and Y, the joint distribution represented by the 


measure £y y factors into the product of the marginal distributions represented by the measures 
L£ N and Ly. 


A joint density for (X, Y) is a nonnegative function fx y (x,y) such that 


Lx y (Ax B) = J | ixen dx dy. 


Not every pair of random variables (X, Y) has a joint density, but if a pair does, then the random 
variables X and Y have marginal densities defined by 


fx(e)= | fxxremdm fry) [xr (69) d. 


These have the properties 


L£x(A) 


f ROE AC 
A 


Ly (B) [iw MBE 


Suppose X and Y have a joint density. Then X and Y are independent variables if and only if 
the joint density is the product of the marginal densities. This follows from the fact that (5.2) is 
equivalent to independence of X and Y. Take A = (—oo, x] and B = (—oo, y], write (5.1) in terms 
of densities, and differentiate with respect to both x and y. 


Theorem 5.7 Suppose X and Y are independent random variables. Let g and h be functions from 
IR to IR. Then g(X) and h(Y ) are also independent random variables. 


PROOF: Let us denote W = g(X) and Z = h(Y). We must consider sets in o(W) and o(Z). But 
a typical set in a (W) is of the form 


{w; Ww) € A} = {w : g(X(w)) € A}, 


which is defined in terms of the random variable X. Therefore, this set is in v(X). (In general, 
we have that every set in o(W) is also in o(.X), which means that X contains at least as much 
information as W. In fact, X can contain strictly more information than W, which means that o (X) 
will contain all the sets in o (W) and others besides; this is the case, for example, if W = X 2 


In the same way that we just argued that every set in o(W) is also in v(X), we can show that 
every set in o (Z) is also in o (Y). Since every set in o (X) is independent of every set in o (Y), we 
conclude that every set in o (W) is independent of every set in o (Z). o 
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Definition 1.18 Let X1, X2,... be a sequence of random variables. We say that these random 
variables are independent if for every sequence of sets Ay € o(X 1), Az € o(X2),... and for every 
positive integer n, 

IP(A N 420 -+e An) = IP(A1)IP(A2) ++ P(A,). 


1.5.4 Correlation and independence 


Theorem 5.8 Iftwo random variables X and Y are independent, and if g and h are functions from 
IR to IR, then 
E[g(X)h(Y)] =Eg(X) -JEN(Y), 


provided all the expectations are defined. 


PROOF: Let g(x) = Fa(x) and h(y) = Ig (y) be indicator functions. Then the equation we are 
trying to prove becomes 


IP({X € AJDAY € B}) = P{X € A}P{Y € B}, 
which is true because X and Y are independent. Now use the standard machine to get the result for 


general functions g and h. © 


The variance of a random variable X is defined to be 
Var(X) 2 E[X - EXP. 
The covariance of two random variables X and Y is defined to be 


Cov( X,Y) 2 ENX- EX)(Y - EY)| 
= E|XY]- EX. EY. 
According to Theorem 5.8, for independent random variables, the covariance is zero. If X and Y 
both have positive variances, we define their correlation coefficient 


Cov(X, Y) 


A 
PO va 


For independent random variables, the correlation coefficient is zero. 


Unfortunately, two random variables can have zero correlation and still not be independent. Con- 
sider the following example. 


Example 1.16 Let X be a standard normal random variable, let 7 be independent of X and have 
the distribution IP{Z = 1} = P{Z = —1] = 0. Define Y = XZ. We show that Y is also a 
standard normal random variable, X and Y are uncorrelated, but X and Y are not independent. 


The last claim is easy to see. If X and Y were independent, so would be X 2 and Y?, but in fact, 
X? = Y? almost surely. 
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We next check that Y is standard normal. For y € JR, we have 
PY <y} = P{Y <yandZ=1}4 PLY < yandZ =-1) 
= IP{X <yandZ=1}4 P{-X <yandZ=-1) 
= P(X <y}P{Z=1}+ P{-X < y P(Z =-1) 


1 1 
= ¿PiX sy} t+ ¿PiX <y} 


Since X is standard normal, P{X < y} = IP{X < —y), and we have P{Y < y} = IP{X < y), 
which shows that Y is also standard normal. 


Being standard normal, both X and Y have expected value zero. Therefore, 
Cov(X, Y) = E[XY] = IE[X?Z] = EX? -EZ =1-0=0. 
Where in JR? does the measure £ x y put its mass, i.e., what is the distribution of (X, Y)? 


We conclude this section with the observation that for independent random variables, the variance 
of their sum is the sum of their variances. Indeed, if X and Y are independent and Z = X +Y, 
then 


|b 


Var(Z) E|(Z - EZ)’] 

= a EX — BY)?] 

= a )?+2(X — EX)(Y — BY) + (Y — BY)? 
= Var(X)+ a IEXJE[Y — EY] + Var(Y) 

= a + Var(Y). 


This argument extends to any finite number of random variables. If we are given independent 
random variables X1, X3,..., Xn, then 


Var(X1 + X2+-+++X,) = Var(X1) + Var(X2) + +--+ Var(X,). (5.3) 


1.5.5 Independence and conditional expectation. 


We now return to property (k) for conditional expectations, presented in the lecture dated October 
19, 1995. The property as stated there is taken from Williams’s book, page 88; we shall need only 
the second assertion of the property: 


(k) If a random variable X is independent of a o-algebra H, then 


E[X|M] = 


The point of this statement is that if X is independent of H, then the best estimate of X based on 
the information in H is EX, the same as the best estimate of X based on no information. 
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To show this equality, we observe first that JÆ X is H-measurable, since it is not random. We must 
also check the partial averaging property 


J wxap= | Xap torevery A € H. 
A A 


If X is an indicator of some set B, which by assumption must be independent of H, then the partial 
averaging equation we must check is 


[Pw ap = | tp dP. 


The left-hand side of this equation is P(A) JP(B), and the right hand side is 


[tater = | tnd = (ANB), 
2 2 


The partial averaging equation holds because A and B are independent. The partial averaging 
equation for general X independent of H follows by the standard machine. 


1.5.6 Law of Large Numbers 


There are two fundamental theorems about sequences of independent random variables. Here is the 
first one. 


Theorem 5.9 (Law of Large Numbers) Let X1, X2,... be a sequence of independent, identically 
distributed random variables, each with expected value and variance o°. Define the sequence of 
averages 


AA A a 
n 


Then Y,, converges to p almost surely as n + 00. 


We are not going to give the proof of this theorem, but here is an argument which makes it plausible. 
We will use this argument later when developing stochastic calculus. The argument proceeds in two 
steps. We first check that JFY,, = yu for every n. We next check that Var(Y,,) > 0 as n > 0. In 
other words, the random variables Y,, are increasingly tightly distributed around y as n —> 00. 


For the first step, we simply compute 
1 1 
EY, = EX + EX: 4 +++ EX,] = y tet + ee = dl 
o. ra 
n times 


For the second step, we first recall from (5.3) that the variance of the sum of independent random 
variables is the sum of their variances. Therefore, 


As n > œ, we have Var(Y,,) — 0. 
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1.5.7 Central Limit Theorem 


The Law of Large Numbers is a bit boring because the limit is nonrandom. This is because the 
denominator in the definition of Y,, is so large that the variance of Y,, converges to zero. If we want 
to prevent this, we should divide by y/n rather than n. In particular, if we again have a sequence of 


independent, identically distributed random variables, each with expected value yu and variance o°, 


but now we set 
A (Xi - y) + (A) +--+ (Xn - ) 


Zn 


then each Z,, has expected value zero and 


As n > œ, the distributions of all the random variables Z, have the same degree of tightness, as 
measured by their variance, around their expected value 0. The Central Limit Theorem asserts that 
as n — œ, the distribution of Z,, approaches that of a normal random variable with mean (expected 
value) zero and variance a”. In other words, for every set A C JR, 


1 z? 
f e 22 dx. 
oV2n JA 


lim P{Zn € A} = 
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Chapter 2 


Conditional Expectation 


Please see Hull’s book (Section 9.6.) 


2.1 A Binomial Model for Stock Price Dynamics 


Stock prices are assumed to follow this simple binomial model: The initial stock price during the 
period under study is denoted So. At each time step, the stock price either goes up by a factor of u 
or down by a factor of d. It will be useful to visualize tossing a coin at each time step, and say that 


e the stock price moves up by a factor of u if the coin comes out heads (H), and 


e down by a factor of d if it comes out tails (7). 


Note that we are not specifying the probability of heads here. 


Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes 
(i.e. sequences of tosses of length 3) is 


Q = {HHH, HHT, HTH, HTT, THH, THH, THT, TTH, TTT}. 


A typical sequence of 2 will be denoted w, and w will denote the kth element in the sequence w. 
We write 57 (w) to denote the stock price at “time” k (i.e. after k tosses) under the outcome w. Note 
that S¿(w) depends only on w1, w2,... , w%. Thus in the 3-coin-toss example we write for instance, 


S1(w) Ê Sy (w1, 09,003) E Si(w1), 


Sa(w) Ê Sa(w1, 002,443) E S2(w1, w2). 


Each S, is a random variable defined on the set (2. More precisely, let F = P(Q). Then F isa 
o-algebra and (Q, F) is a measurable space. Each 57 is an F-measurable function Q— JR, that is, 
Se ' is a function BF where B is the Borel o-algebra on R. We will see later that S% is in fact 
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S3(TTH) = d? u So 
=T - 
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S, (TT) = 47S 
ee NG 


9 S3 (TTT) =d? Sọ 
Figure 2.1: A three coin period binomial model. 


measurable under a sub-c-algebra of F. Recall that the Borel c-algebra B is the c-algebra generated 
by the open intervals of R. In this course we will always deal with subsets of R that belong to B. 


For any random variable X defined on a sample space 2 and any y € JR, we will use the notation: 
A 
{X < y} = {v E€ 9; X (w) < y). 


The sets {X < y}, {X > y), {X = y}, etc, are defined similarly. Similarly for any subset B of JR, 
we define a 
{X € B} = {w € 0; X(w) € B). 


Assumption 2.1 u > d > 0. 


2.2 Information 


Definition 2.1 (Sets determined by the first k tosses.) We say that a set A C Q is determined by 
the first k coin tosses if, knowing only the outcome of the first k tosses, we can decide whether the 
outcome of all tosses is in A. In general we denote the collection of sets determined by the first k 
tosses by F}. It is easy to check that F;, is a c-algebra. 


Note that the random variable Sy is #;-measurable, for each k = 1,2,...,. 


Example 2.1 In the 3 coin-toss example, the collection F, of sets determined by the first toss consists of: 
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1. Ay 2 {HHH, HHT, HTH, HTT}, 
2. Ar £ {THH, THT, TTH, TIT), 
3: Q, 

4. Q. 


The collection Fa of sets determined by the first two tosses consists of: 


i. Ann € {HHH HHT}, 
. Aur £ {HTH HTT}, 
Aru £ {THH,THT}, 
. Arr £ {TTH, TTT}, 


. The complements of the above sets, 


Any union of the above sets (including the complements), 
. ¿and Q. 


DuA V N 


Definition 2.2 (Information carried by a random variable.) Let X be a random variable 2— IR. 
We say that a set A C Q is determined by the random variable X if, knowing only the value X (w) 
of the random variable, we can decide whether or notw € A. Another way of saying this is that for 
every y € IR, either X7*(y) C A or XT! (y) N A = 4. The collection of susbets of 9 determined 
by X is a o-algebra, which we call the c-algebra generated by X, and denote by o(X). 


If the random variable X takes finitely many different values, then o (X) is generated by the collec- 
tion of sets 
(AUX (w))|w € 9}; 


these sets are called the atoms of the o-algebra o (X). 


In general, if X is a random variable Q— IR, then o (X) is given by 
o(X) = {X71(B);B € B}. 
Example 2.2 (Sets determined by 5») The c-algebra generated by S2 consists of the following sets: 


. Ann = {HHA, HHT} = {w E€ Q; Sa(w) = u? So}, 
. Arr = {TTH, TTT} = {So = d'So}, 

Aut U Ary = {92 = udSo}, 

. Complements of the above sets, 

. Any union of the above sets, 

- 6 = {S2(w) € >}, 

. Q= {So(w) € R}. 


YAMA WH = 
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2.3 Conditional Expectation 


In order to talk about conditional expectation, we need to introduce a probability measure on our 
coin-toss sample space Q. Let us define 


e p € (0, 1) is the probability of H, 

e q = (1 — p) is the probability of T, 

e the coin tosses are independent, so that, e.g., IP(H HT) = p*q, etc. 
© P(A) Ê Zuea Pw), YA CO. 


Definition 2.3 (Expectation.) 


EX = Y” X(e) Pl). 


we 
If A C Q then 
AJ 1 ifueA 
Tato) la if g A 
and 


We can think of (4X) as a partial average of X over the set A. 


2.3.1 An example 


Let us estimate S4, given S2. Denote the estimate by /E(S1|S2). From elementary probability, 
JE (S1|S2) is a random variable Y whose value at w is defined by 


Y (v) = IE(Si|S2 = y), 
where y = S2(w). Properties of IE (S1|S2): 
e /E(S1|S2) should depend on w, i.e., it is a random variable. 


e If the value of 52 is known, then the value of IE (S1|S2) should also be known. In particular, 


- Ifw = HHH orw = HAT, then S2(w) = u? So. If we know that $2(w) = u?So, then 
even without knowing w, we know that S1 (w) = uSy. We define 


-Ifw=TTT or w = TTH, then S2(w) = d?So. If we know that S2(w) = d*,So, then 
even without knowing w, we know that Sı (w) = dSo. We define 


E(S1|S2)(TTT) = E(81|S2)(TTH) = dso. 
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-Ifwe A={HTH, HTT,THH, THT}, then S2(w) = udSy. If we know S2(w) = 
udSo, then we do not know whether Sı = uSy or 51 = dSo. We then take a weighted 
average: 

P(A) = pq + pẹ? + pq + pe? = 2pq. 


Furthermore, 


i S;diP = p quSo + pa uso + p*qdSo + pq dSo 
= palu+d)So 


For w € A we define 


_ Ja SdP 


Then 
J Ets yar= f KUR 
A A 


In conclusion, we can write 
E (S1|S2) (0) = g(S2(w)), 


where 
uso if £ = u? So 
glz)=4 4(u+d)So ife = udso 
dSo if £ = d? So 


In other words, IŒ (S1| S2) is random only through dependence on 52. We also write 
JE(S1152 = z) = g(x), 


where g is the function defined above. 


The random variable /E(S¡|.52) has two fundamental properties: 


e IE(S1|S2) is o(S2)-measurable. 
e For every set A € o(S9), 
f IE(S\|S2)diP = f SdP. 
A A 


2.3.2 Definition of Conditional Expectation 


Please see Williams, p.83. 


Let (Q, F, IP) be a probability space, and let G be a sub-o-algebra of F. Let X be a random variable 
on (Q, F, IP). Then IE (X |G) is defined to be any random variable Y that satisfies: 


(a) Y is G-measurable, 
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(b) For every set A € G, we have the “partial averaging property” 


f YdP = f XdP. 
A A 


Existence. There is always a random variable Y satisfying the above properties (provided that 
JE|X| < 00), i.e., conditional expectations always exist. 


Uniqueness. There can be more than one random variable Y satisfying the above properties, but if 
Y” is another one, then Y = Y” almost surely, i.e., P{w € Q;Y(w) = Y"(w)} = 1. 


Notation 2.1 For random variables X, Y, it is standard notation to write 
A 
E(X (Y) = E(X|o(Y)). 
Here are some useful ways to think about IE (X |G): 


e A random experiment is performed, i.e., an element w of Q is selected. The value of w is 
partially but not fully revealed to us, and thus we cannot compute the exact value of X (w). 
Based on what we know about w, we compute an estimate of X (w). Because this estimate 
depends on the partial information we have about w, it depends on w, i.e., E[X|Y](w) is a 
function of w, although the dependence on w is often not shown explicitly. 


e If the 0-algebra Y contains finitely many sets, there will be a “smallest” set A in G containing 
w, which is the intersection of all sets in G containing w. The way w is partially revealed to us 
is that we are told it is in A, but not told which element of A it is. We then define JE[X|Y](c) 
to be the average (with respect to IP) value of X over this set A. Thus, for all w in this set A, 
IE|X|Y](w) will be the same. 
2.3.3 Further discussion of Partial Averaging 


The partial averaging property is 
[ Exiar = [ xapwa ES. (3.1) 
We can rewrite this as 
Ella JE(X|G)] = E [14.X]. (3.2) 
Note that 74 is a G-measurable random variable. In fact the following holds: 


Lemma 3.10 If V is any G-measurable random variable, then provided IE|V.IE(X|G)| < oo, 


IE(VJE(X|G)] = E[V.X]. (3.3) 
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Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) when V is a simple 
G-measurable random variable, i.e., V is of the form V = >%-¡ cel 4,, Where each Az is in G and 
each cx is constant. Next consider the case that V is a nonnegative G-measurable random variable, 
but is not necessarily simple. Such a V can be written as the limit of an increasing sequence 
of simple random variables V,,; we write (3.3) for each V, and then pass to the limit, using the 
Monotone Convergence Theorem (See Williams), to obtain (3.3) for V. Finally, the general G- 
measurable random variable V can be written as the difference of two nonnegative random-variables 
V = Vt —V-, and since (3.3) holds for Vt and V~ it must hold for V as well. Williams calls 
this argument the “standard machine” (p. 56). | 


Based on this lemma, we can replace the second condition in the definition of a conditional expec- 
tation (Section 2.3.2) by: 


(b”) For every G-measurable random-variable V, we have 


E[VIE(X|9)] = IE[V.X]. (3.4) 


2.3.4 Properties of Conditional Expectation 


Please see Willams p. 88. Proof sketches of some of the properties are provided below. 
(a) IFUE(X|G)) = E(X). 

Proof: Just take A in the partial averaging property to be (2. 

The conditional expectation of X is thus an unbiased estimator of the random variable X. 
(b) If X is G-measurable, then 

E(X|G) = X. 

Proof: The partial averaging property holds trivially when Y is replaced by X. And since X 

is G-measurable, X satisfies the requirement (a) of a conditional expectation as well. 

If the information content of G is sufficient to determine X , then the best estimate of X based 


on G is X itself. 


(c) (Linearity) 
E (a, X1 + a2 X9|G) = a JE (X4|G) + al (X99). 


(d) (Positivity) If X > 0 almost surely, then 
IE(X|G) > 0. 
Proof: Take A = {w € Q; E(X|G)(w) < 0}. This set is in G since IÆ (X |G) is G-measurable. 
Partial averaging implies f, X(X|G)dIP = f, XdIP. The right-hand side is greater than 


or equal to zero, and the left-hand side is strictly negative, unless P(A) = 0. Therefore, 
IP(A) =0. 
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(h) (Jensen’s Inequality) If $ : R-R is convex and IE|é(X)| < oo, then 
E(O(X)|G) > 6UE(X|9)). 
Recall the usual Jensen’s Inequality: JES(X) > @UE(X)). 
(i) (Tower Property) If H is a sub-o-algebra of G, then 
EME O) A) = E(X |H). 


H is a sub-o-algebra of G means that G contains more information than H. If we estimate X 
based on the information in G, and then estimate the estimator based on the smaller amount 
of information in H, then we get the same result as if we had estimated X directly based on 
the information in H. 


(j) (Taking out what is known) If Z is G-measurable, then 
IE(ZX |G) = Z.JE(X|G). 


When conditioning on G, the G-measurable random variable Z acts like a constant. 


Proof: Let Z be a G-measurable random variable. A random variable Y is JE (ZX |G) if and 
only if 


(a) Y is G-measurable; 
(b) {[,¥dP = f, ZXdIP,VA €G. 


Take Y = Z.JE(X|G). Then Y satisfies (a) (a product of G-measurable random variables is 
G-measurable). Y also satisfies property (b), as we can check below: 


f Y dIP 
A 


E(14.Y) 


IE[L4ZIE(X|G)] 
E[I4Z.X] (©) with V = 147 


f ZX dP. 
A 


(k) (Role of Independence) If H is independent of o (o (X), G), then 
IE(X|o(G, H)) = IE(X|G). 
In particular, if X is independent of H, then 
IE(X|H) = E(X). 


If H is independent of X and G, then nothing is gained by including the information content 
of H in the estimation of X. 
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2.3.5 Examples from the Binomial Model 


Recall that Fı = {¢, Ap, Ar, Q}. Notice that Æ (S2| F1) must be constant on Ay and Ar. 
Now since JE (S2| F1) must satisfy the partial averaging property, 


f ESF dP = f SadiP, 
Ag AH 


f ESF )dP= f SadiP. 
Ar Ar 


We compute 


/ E(S|Fi)dP = P(An) ESAF) (0) 


pIE(S2|F1)(w), Vw € Ap. 


On the other hand, 
f SodIP = p'u? So + pqud So. 
AH 


Therefore, 
IE(S2|Fi)(w) = pu? So + qudSo, Vw € Ay. 
We can also write 
IE(S9|Fi)(w) = pu? So + qudSo 
(pu + qd)uSo 
= (put qd)Si(w),Vw € Ay 


Similarly, 
IE(S2|F1)(w) = (put qd)51 (w), Vw € Ar. 


Thus in both cases we have 
IE(S2|F1)() = (pu + qd) 51 (w), Vw € Q. 
A similar argument one time step later shows that 
IE(S3|F2)(w) = (pu + qd) S2(w). 


We leave the verification of this equality as an exercise. We can verify the Tower Property, for 
instance, from the previous equations we have 


IEVUE(S3|F2)|Fi] = JE[(pu + qd) S2|F 2] 


(pu + qd)IE(S2|F1) (linearity) 
= (put qd)?S,. 


This final expression is JE (S3| F1). 
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2.4 Martingales 


The ingredients are: 


e A probability space (Q, F, IP). 


e A sequence of o-algebras Fo, F1, ... , Fn, with the property that Fo C Fi CT... C Fn < 
F. Such a sequence of o-algebras is called a filtration. 


e A sequence of random variables Mo, Mi,... , Mn. This is called a stochastic process. 
Conditions for a martingale: 


1. Each Mk is F;,-measurable. If you know the information in Fg, then you know the value of 
My. We say that the process { M; } is adapted to the filtration {F;,}. 


2. For each k, IE(My+11F1) = My. Martingales tend to go neither up nor down. 


A supermartingale tends to go down, i.e. the second condition above is replaced by IE(Mx+11Fy) < 
Mpk; a submartingale tends to go up, i.e. EE(My4i|F x) > Mz. 


Example 2.3 (Example from the binomial model.) For k = 1, 2 we already showed that 
TE (Shi |Fr) = (pu + qd) Si. 


For k = 0, we set Fo = {¢, Q}, the “trivial 0-algebra”. This c-algebra contains no information, and any 
F¿-measurable random variable must be constant (nonrandom). Therefore, by definition, Æ (S1 |Fo) is that 
constant which satisfies the averaging property 


J ESF | SdP. 
Q Q 


The right hand side is ES, = (pu + qd)So, and so we have 
E(Si| Fo) = (pu + qd)So. 
In conclusion, 


e If (pu + qd) = 1 then (Si, Fp; k = 0,1, 2,3} is a martingale. 
e If (pu + qd) > 1 then {S;, Fx; k = 0,1, 2,3} is a submartingale. 
o If (put qd) < 1 then {5;, Fp; k = 0,1, 2,3} is a supermartingale. 


Chapter 3 


Arbitrage Pricing 


3.1 Binomial Pricing 


Return to the binomial pricing model 


Please see: 


e Cox, Ross and Rubinstein, J. Financial Economics, 7(1979), 229-263, and 


e Cox and Rubinstein (1985), Options Markets, Prentice-Hall. 


Example 3.1 (Pricing a Call Option) Suppose u = 2,d = 0.5,r = 25%(interest rate), So = 50. (In this 
and all examples, the interest rate quoted is per unit time, and the stock prices So, 51,... are indexed by the 
same time periods). We know that 


_f 100 ifua=H 
Siw)={ os if =T 


Find the value at time zero of a call option to buy one share of stock at time 1 for $50 (i.e. the strike price is 
$50). 


The value of the call at time 1 is 


50 ifw, = 
Vi (1) = (51 (10) — 50)* = { Go tenet 


Suppose the option sells for $20 at time 0. Let us construct a portfolio: 


1. Sell 3 options for $20 each. Cash outlay is —$60. 
2. Buy 2 shares of stock for $50 each. Cash outlay is $100. 
3. Borrow $40. Cash outlay is —$40. 


59 


60 


This portfolio thus requires no initial investment. For this portfolio, the cash outlay at time 1 is: 


w= H wi =T 


Pay off option $150 $0 
Sell stock —$200 —$50 
Pay off debt $50 $50 
$0 $0 
The arbitrage pricing theory (APT) value of the option at time 0 is Vo = 20. E 


Assumptions underlying APT: 


e Unlimited short selling of stock. 
e Unlimited borrowing. 
e No transaction costs. 


e Agent is a “small investor’, i.e., his/her trading does not move the market. 


Important Observation: The APT value of the option does not depend on the probabilities of H 
and T. 


3.2 General one-step APT 


Suppose a derivative security pays off the amount V; at time 1, where V; is an F¡-measurable 
random variable. (This measurability condition is important; this is why it does not make sense 
to use some stock unrelated to the derivative security in valuing it, at least in the straightforward 
method described below). 


e Sell the security for Vo at time 0. (Vo is to be determined later). 
e Buy Ao shares of stock at time 0. (Ay is also to be determined later) 


e Invest Vo — AoSo in the money market, at risk-free interest rate r. (Vo — AoSo might be 
negative). 


e Then wealth at time 1 is 


Xi AogSy + (1 + r) (Vo = AoSo) 


= (1+ r)Vo + Ao(S1 — (1+ r)So). 


e We want to choose Vo and Ag so that 
XA1 = Vy 


regardless of whether the stock goes up or down. 
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The last condition above can be expressed by two equations (which is fortunate since there are two 
unknowns): 


(1+ r)Vo + Ao(S1 (H) — (1 T r) So) = V¡(H) (2.1) 


(1+r)Vo + Ao(S1(T) = (1 T r) So) = Vi(T) (2.2) 


Note that this is where we use the fact that the derivative security value Vg is a function of Sp, 
i.e., when 57 is known for a given w, V; is known (and therefore non-random) at that w as well. 
Subtracting the second equation above from the first gives 
_ MG) - Vi(Z) 
0 = Se 
SiC) — S(T) 


Plug the formula (2.3) for Ag into (2.1): 


(2.3) 


(1+ r)Vo = Vi (A) — Ao(Si(H) = (1+ 1r)So) 


ES Vi( —V,(T) 

= Wo Gass. ES 

= - glu - Q(B) — (HE) — Vi(T)(u — 1 — r)] 
lt+r-d u—-l—r 

= u—d Vi(H) u—d Vi(T) 


We have already assumed u > d > 0. We now also assume d < 1 +r < u (otherwise there would 
be an arbitrage opportunity). Define 


Then p > 0 andg > 0. Since p + ¢ = 1, we have 0 < p< landg = 1 — p. Thus, p, q are like 
probabilities. We will return to this later. Thus the price of the call at time 0 is given by 


1 


3.3 Risk-Neutral Probability Measure 


Let Q be the set of possible outcomes from n coin tosses. Construct a probability measure Pon 
by the formula 


A a ea a hobo eS 
P(w, w2, sa Wn) = pH) gwT) 


P is called the risk-neutral probability measure. We denote by JE the expectation under P. Equa- 


tion 2.4 says 
— 1 
Vo = E Vij. 
° Ge 1) 
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Theorem 3.11 Under IP, the discounted stock price process LAS, F_}e_9 is a martingale. 


Proof: 


EL + r) ED S541 |Fa] 


= (11 (put Gd) Si 

= O e Ss 
= try vO g, 

= ins, 


3.3.1 Portfolio Process 


The portfolio process is A = (Ag, Ay,... , An—1), where 


e A, is the number of shares of stock held between times k and k + 1. 


e Each A; is F¡-measurable. (No insider trading). 


3.3.2 Self-financing Value of a Portfolio Process A 


e Start with nonrandom initial wealth Xy, which need not be 0. 


e Define recursively 


Xk+ = Aksra + (1 +7r)(X_ — AgSz) (3.1) 
(1+r)Xk + Ar(Sx+1 = (1+ r)S,). (3.2) 


e Then each Xy is F y -measurable. 


Theorem 3.12 Under IP, the discounted self-financing portfolio process value {(1 + r) Xx, Fk Yo 
is a martingale. 


Proof: We have 


(+r) Et) Xp = (Ltr) FX + Ag (a Pp OOP) Sih 1 roe) ; 
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Therefore, 


ELL +r) y] Fa] 
= B+E XF] 
HR + rTPA; Skl F] 
—E[(1 + PAS Fr] 
(1+ py Xy (requirement (b) of conditional exp.) 
+A, E[(1 +r) Et Spil Fz] (taking out what is known) 
—(1+r)-*A,S; (property (b)) 
(1+r)-*X;, (Theorem 3.11) 


3.4 Simple European Derivative Securities 


Definition 3.1 () A simple European derivative security with expiration time m is an F,, -measurable 
random variable Vm. (Here, m is less than or equal to n, the number of periods/coin-tosses in the 
model). 


Definition 3.2 () A simple European derivative security Vm is said to be hedgeable if there exists 
a constant Xo and a portfolio process A = (Ao,...,Am-1) such that the self-financing value 
process Xo, X1,...,Xm given by (3.2) satisfies 


Xm(w)=Vin(w), WER. 
In this case, for k = 0,1,...,m, we call Xy the APT value at time k of Vm. 


Theorem 4.13 (Corollary to Theorem 3.12) Jf a simple European security Vm is hedgeable, then 
foreach k =0,1,...,m, the APT value at time k of Vm is 


VE (1 + OFE +r)” Vil Fr. (4.1) 


Proof: We first observe that if {M;,, Fp;k = 0,1,...,m} is a martingale, i.e., satisfies the 
martingale property 


EM Fx] = Mk 
for each k = 0, 1,... ,m — 1, then we also have 
ElMm| Fk] = Mk, k = 0,1,...,m—1. (4.2) 


When k = m — 1, the equation (4.2) follows directly from the martingale property. For k = m — 2, 
we use the tower property to write 
E[Mm|Fm-2] = EIE Mm|Fm-1]|Fm-2] 
E| Mm-1 |F m-2] 
= Mm-2. 
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We can continue by induction to obtain (4.2). 


If the simple European security Vm is hedgeable, then there is a portfolio process whose self- 
financing value process Xo, X1, ... , Xm satisfies Xm = Vm. By definition, Xx is the APT value 
at time k of Vm. Theorem 3.12 says that 


Kelley Nigel pe Xn 
is a martingale, and so for each k, 
(L+r)7*X, = Ell +r) Xn Fa] = El + r) VnF]. 


Therefore, = 
Xg = (1+ OFE + r) Vanl Fel. 


3.5 The Binomial Model is Complete 


Can a simple European derivative security always be hedged? It depends on the model. If the answer 
is “yes”, the model is said to be complete. If the answer is “no”, the model is called incomplete. 


Theorem 5.14 The binomial model is complete. In particular, let V,,, be a simple European deriva- 
tive security, and set 


Velar, w) = (AEE + r)~-"Vm|Fa](w1,.- 0%), (5.1) 
Veti(@1,--. Wk, H) — Verrlw1,.-. wk T) 

A; (wy, ... wp) = AAA AAA 5.2 

A EE a ASOD) ao 


Starting with initial wealth Vo = E + r)” Vm], the self-financing value of the portfolio process 
Ao, Aj, ... , Am-1 is the process Vo, Vi,... , Vm- 


Proof: Let Vo,...,Vin—1 and Ag,... , Am-1 be defined by (5.1) and (5.2). Set Xy = Vo and 
define the self-financing value of the portfolio process Ap,... , Aj 1 by the recursive formula 3.2: 


Xk41 = Arg + (1+r)(Xk — AgSg). 
We need to show that 
Xk =Vk, Vke{0,1,... m}. (5.3) 


We proceed by induction. For k = 0, (5.3) holds by definition of Xo. Assume that (5.3) holds for 
some value of k, i.e., for each fixed (w1, ... , wg), we have 


Xplcr, prea y Wk) = Vilos, EER Wh). 
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We need to show that 


Xp41(W1, eat „wk, H) = Vki (w1, kay „wk, H), 


Xk41 (w1, ... Wk, T) = Vesti (wi, ... Wk T). 
We prove the first equality; the second can be shown similarly. Note first that 
ELA NAO a F] = EE + r)” Val Fea] Fr] 
E| (1 +r)” Vm Fk] 
= (1+ r) FV, 


In other words, {(1 + r)~*V,}%_, is a martingale under TP. In particular, 


Velor,...,0) = Eld+r Vpl Frw... wk) 


1 v da 
+ ua (Vki (wis -wk H) + Vari (w1,..., We, 1)). 


Since (w1, . . . , wp) will be fixed for the rest of the proof, we simplify notation by suppressing these 
symbols. For example, we write the last equation as 


1 > 3 
V; = ES (Vr UD) + GV (T)). 


We compute 


X 41 (H) 
= AkSky1 (H) + (1+r)(Xk — AkSk) 
= Ar (Sp41 (41) — (1 


r)S rk) I (14 | r) Vk 
Vka (H) — Vk (T) 
s ST (Sr+1(H) — (14 r)51) 
+PVin4o1 (A) + Vr (T) 
7 Sree (us; — (1 +r) Sx) 
+pVin41 (A) + Ver (T) 


= Wisi (F) = Vasu (T) (=) + PV (H) + Wis (T) 
= Veri(H) — Viet (TD) 9+ EV+ (H) + GV iri (T) 
= Viy (H). 
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Chapter 4 


The Markov Property 


4.1 Binomial Model Pricing and Hedging 


Recall that Vm is the given simple European derivative security, and the value and portfolio pro- 
cesses are given by: 


Ve = (14 r)*E[l4r)-"Vnl Fal) &=0,1,...,m—1. 


_ Vki (w1, ... „wk, H) = Vesti (wi, ... Wk, T) 


A Eis lo 2 ———— 
NONE ES Leet 


Example 4.1 (Lookback Option) u = 2,d = 0.5,r = 0.25, So = 4, p= Lira 05, q4=1-p=0.,5. 
Consider a simple European derivative security with expiration 2, with payoff given by (See Fig. 4.1): 


Vo = max (Sk — 5). 
0<k<2 
Notice that 
V(HH)=11, V(AD) =3 4 Vo(TH) =0, VA(TT)=0. 
The payoff is thus “path dependent”. Working backward in time, we have: 
1 4 
Vi(H)= Te o + qV¥a(HT)| = 5105 x 11+ 0.5 x 3] = 5.60, 
r 


_4 


Vi(T 510.5 x 0+0.5 x 0] = 0, 


— 


4 
Vo = 5 [0.5 x 5.60 + 0.5 x 0] = 2.24. 


Using these values, we can now compute: 


AE 
ae Si(A) — S(T) es 
O Vo(HH) — Vo(HT) _ 
AUIS So(HH) — SHT) sels 
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a ore 
S, (H)=8 
DoS S,(HT) =4 
S = 
wee S2(TH) =4 
f S “ix. 
S5(TT) = 


Working forward in time, we can check that 
Xi (A) = Aos (H) + (1 + r)(Xo = Ao So) = 5.59; Vi(A) = 5.60, 
Xı (T) = AoSi(T) + (1 + r)(Xo == AoSo) = 0.01; Vi(T) = 0, 
X (HA) = A (DS (HD + (14+ 7)(X1 (A) — A (DSi (HD) = 11.01; (HH) = 11, 
etc. 


Example 4.2 (European Call) Let u = 2,d = 4,r = 4, So = 4,p = G= 3, and consider a European call 
with expiration time 2 and payoff function 


Note that 
Vo(HH)=11, Vo(HT) = Va(TH) =0, Va(TT) = 0, 


4 

Vi(H) = 5 [3.114 3.0] = 4.40 
4 

Vi(T) = =[4.0+ 4.0] =0 


5 
4 
Vo = Fl x 4.40 + 3 x 0] = 1.76. 
Define vz (+) to be the value of the call at time k when Sp = z. Then 


v(e) = Žlkva(22) + $v2(2/2), 


vo(æ) = =[4v1(2x) + 4v1 (2/2)]. 
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In particular, 
va(16) = 11, va(4) = 0, va(1) = 0, 


4 
v1(8) = gly + 5.0] = 4.40, 


A 


vı (2) 5 


[4.0 + 4.0] =0, 


4 
vo = zl x 4.40 + 4 x 0] = 1.76. 
Let 6, (1) be the number of shares in the hedging portfolio at time k when Sy = x. Then 


óp (2) = veros) E oe k=0,1. 


4.2 Computational Issues 


For a model with n periods (coin tosses), 2 has 2” elements. For period k, we must solve 2% 
equations of the form 


Ls z 
Velez, ... ,0x) = Ty Viti (er, 2. Why H) + GVegi(1,..., 04, D) 


For example, a three-month option has 66 trading days. If each day is taken to be one period, then 
n = 66 and 2° ~ 7 x 10%, 


There are three possible ways to deal with this problem: 


1. Simulation. We have, for example, that 
Vo = (1 + r)" EV, 


and so we could compute Vo by simulation. More specifically, we could simulate n coin 
tosses w = (wy ,...,W,,) under the risk-neutral probability measure. We could store the 
value of V, (w). We could repeat this several times and take the average value of V,, as an 
approximation to EV. 


2. Approximate a many-period model by a continuous-time model. Then we can use calculus 
and partial differential equations. We’ll get to that. 


3. Look for Markov structure. Example 4.2 has this. In period 2, the option in Example 4.2 has 
three possible values v2 (16), v2(4), v2(1), rather than four possible values V2 (H H), V( HT), VAT H), V2(TT). 
If there were 66 periods, then in period 66 there would be 67 possible stock price values (since 
the final price depends only on the number of up-ticks of the stock price — i.e., heads — so far) 
and hence only 67 possible option values, rather than 266 ~ 7 x 101°. 
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4.3 Markov Processes 


Technical condition always present: We consider only functions on R and subsets of R which are 
Borel-measurable, i.e., we only consider subsets A of R that are in B and functions g : RJR such 
that g7? is a function BB. 


Definition 4.1 () Let (Q, F, P) be a probability space. Let {F},}%_, be a filtration under F. Let 
{Xz Vio be a stochastic process on (2, F, P). This process is said to be Markov if: 


e The stochastic process {X+} is adapted to the filtration {7;,}, and 


e (The Markov Property). For each k = 0,1,...,n — 1, the distribution of X;,41 conditioned 
on Fx is the same as the distribution of X41 conditioned on X;. 


4.3.1 Different ways to write the Markov property 


(a) (Agreement of distributions). For every A € B = BUR), we have 


IP(Xx41 E€ AIF) = ElA(X ry) Fk] 
Ela (Xh41)1X 4) 
= PX p41 € A|Xz]. 


(b) (Agreement of expectations of all functions). For every (Borel-measurable) function A : IR JR 
for which IE |h(X;,41)| < 00, we have 


EX 4 1 1] = MEX 41) 1X 1. 
(c) (Agreement of Laplace transforms.) For every u € JR for which BetXkt < oo, we have 


IE [ek 


Fr] =E ii 


Xal l 


(If we fix u and define h(a) = e“”, then the equations in (b) and (c) are the same. However in 
(b) we have a condition which holds for every function A, and in (c) we assume this condition 
only for functions h of the form h(x) = e“”. A main result in the theory of Laplace transforms 
is that if the equation holds for every A of this special form, then it holds for every h, i.e., (c) 
implies (b).) 


(d) (Agreement of characteristic functions) For every u € JR, we have 
E [eX r+ \F | -E [Ari |X+] : 


where i = y=]. (Since |e’“”| = | cos + +sin «| < 1 we don’t need to assume that F/|e""| < 
00.) 


CHAPTER 4. The Markov Property 71 


Remark 4.1 In every case of the Markov properties where JE[...|X¡] appears, we could just as 
well write g(X;,) for some function g. For example, form (a) of the Markov property can be restated 
as: 


For every A € B, we have 
P(Xr41 E ALF x) = 9(Xx), 
where g is a function that depends on the set A. 


Conditions (a)-(d) are equivalent. The Markov property as stated in (a)-(d) involves the process at 
a “current” time k and one future time k + 1. Conditions (a)-(d) are also equivalent to conditions 
involving the process at time & and multiple future times. We write these apparently stronger but 
actually equivalent conditions below. 


Consequences of the Markov property. Let j be a positive integer. 


(A) For every Ajy1 C R,... Any; CAR, 


PlX 41 € Art; 00) Ag E Ary Fi] = PX r4 E Akti,- Xktj E Arg Xd. 


(A?) For every A € IR?, 


IP|(Xk+1, ae Xk+) € AF x] = PAX bro EE Xk+j) € A|Xk). 


(B) For every function h : IR! JR for which IE|h(Xp41,... , Xk4;)| < 00, we have 


IE(A(X 441, a e a = IE{A(X 441, cs Ar A). 


(C) For every u = (Ugy1,..- , Uk+j) E IR) for which ¡EJ en Ai41 t- Fur Xrti] < 00, we have 


Ejem tung 5 Ah Fi] = IE[e%* piXepi te FU A X4]. 


(D) For every u = (k41, .-- , Uk+j) € IR! we have 


Ewan Xrti) F = eters Xanga te tuts Xrti) Xx], 


Once again, every expression of the form JE(...[X,) can also be written as g(.X;), where the 
function g depends on the random variable represented by ... in this expression. 


Remark. All these Markov properties have analogues for vector-valued processes. 
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Proof that (b) => (A). (with 7 = 2 in (A)) Assume (b). Then (a) also holds (take h = 14). 
Consider 
IP[Xk+1 € Angi, Xk+ E Agra l Fa] 
= EU App Xr) App Ate 2) 1F 1] 

(Definition of conditional probability) 
= EEU A, Xr App Pa 11F 4] 
(Tower property) 
= Ela Xr) EUA, A Frl] 
(Taking out what is known) 
= Ela, (At) LE y, Xr) lX] Fe] 
(Markov property, form (a).) 
= EUA, Xrti) 9 Xess) | Fe] 
(Remark 4.1) 
= Ella, Xr) 9(Añ41)1X4] 

(Markov property, form (b).) 


Now take conditional expectation on both sides of the above equation, conditioned on o (X+), and 
use the tower property on the left, to obtain 


IP[Xk41 E Ák+1; Xk+2 € Ak+2| Xz] = IEU A p14 X k+1)-9(Xk+1)| Xz]. (3.1) 


Since both 
IP[ Xk+ € Angi, Xk+ E Akpol Fr] 


and 


IP[ Xk+ € Angi, Xk42 E Ak+2l Xq] 


are equal to the RHS of (3.1)), they are equal to each other, and this is property (A) with j = 2. m 


Example 4.3 It is intuitively clear that the stock price process in the binomial model is a Markov process. 
We will formally prove this later. If we want to estimate the distribution of 5,41 based on the information in 
Fp, the only relevant piece of information is the value of S;. For example, 


E[Sr41|Fe] = (Pu + d) Sk = (14+ r) Sp (3.2) 


is a function of S. Note however that form (b) of the Markov property is stronger then (3.2); the Markov 
property requires that for any function h, k 
E[h(Skt)| Fk] 


is a function of Sẹ. Equation (3.2) is the case of h(x) = æ. 


Consider a model with 66 periods and a simple European derivative security whose payoff at time 66 is 


1 
Ves = 3 (Se + Ses + See). 
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The value of this security at time 50 is 


Vso = (Ltr) EL +r) i V| Fso] 
= (1+r) 1 E[Vss|Ss0], 


because the stock price process is Markov. (We are using form (B) of the Markov property here). In other 
words, the Fzoọ-measurable random variable Vs, can be written as 


Vsolwz,... ,ws0) = g(Ss50(w1,... ,W50)) 


for some function g, which we can determine with a bit of work. E 


4.4 Showing that a process is Markov 


Definition 4.2 (Independence) Let (Q, F, P) be a probability space, and let G and H be sub-0- 
algebras of F. We say that G and H are independent if for every A € G and B € H, we have 


P(AN B) = P(A) P(B). 


We say that a random variable X is independent of a o-algebra G if o (X), the o-algebra generated 
by X, is independent of G. 


Example 4.4 Consider the two-period binomial model. Recall that F, is the c-algebra of sets determined 
by the first toss, i.e., Fı contains the four sets 


An Ê {HH, HT}, Ap S{TH,TT}, 6, Q. 
Let H be the c-algebra of sets determined by the second toss, i.e., H contains the four sets 
(HH, TH}, (HT, TT), 6,9. 


Then F, and A are independent. For example, if we take A = {H H, HT} from F¡ and B = {HH, TH} 
from A, then P(A A B) = IP(HH) = p? and 


P(A)IP(B) = (p +pglo? + pa) =P (p +4) =p”. 
Note that Fı and S% are not independent (unless p = 1 or p = 0). For example, one of the sets in ø (S2) is 
[w; Sa(w) = u?So} = {HH}. If we take A = {HH, HT} from F, and B = {HH} from o(S2), then 
P(ANB)=P(HH)= p°, but 


P(A) IP(B) = (P? + pap” = pp +4) = p”. 


The following lemma will be very useful in showing that a process is Markov: 


Lemma 4.15 (Independence Lemma) Let X and Y be random variables on a probability space 
(Q, F,P). Let G be a sub-o-algebra of F. Assume 
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e X is independent of G; 


e Y is G-measurable. 


Let f(x, y) be a function of two variables, and define 


ay) = Ef (X,y). 
Then 
EIX, YG] = gY). 


Remark. In this lemma and the following discussion, capital letters denote random variables and 
lower case letters denote nonrandom variables. 


Example 4.5 (Showing the stock price process is Markov) Consider an n-period binomial model. Fix a 
time k and define X Ê Spt and G Ê Fp. Then X = wifwe41 = H and X = dif we41 = T. Since X 


depends only on the (k + 1)st toss, X is independent of G. Define Y a Sk, so that Y is G-measurable. Let A 
be any function and set f (£, y) 2 h(ay). Then 


gly) = E(X, y) = Eh(Xy) = ph(uy) + ah(dy). 


The Independence Lemma asserts that 


ETh(Sk41)|F] 


S 
EElh ( a Si) |F;] 
k 


= Elf(X,Y)I|g] 
g(Y) 
= ph(uS,) + gh(dS;). 


This shows the stock price is Markov. Indeed, if we condition both sides of the above equation on o(S;,) and 
use the tower property on the left and the fact that the right hand side is ø (Sx )-measurable, we obtain 


Thus Æ [h (Sk41)| Fp] and Æ[h(Sk+1)| Xk] are equal and form (b) of the Markov property is proved. 


Not only have we shown that the stock price process is Markov, but we have also obtained a formula for 
JE[h(Sk41)|F y] as a function of Sx. This is a special case of Remark 4.1. 


4.5 Application to Exotic Options 
Consider an n-period binomial model. Define the running maximum of the stock price to be 
A 
My = ae Sj. 


Consider a simple European derivative security with payoff at time n of vn (Sn, Mn). 


Examples: 
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e uv, (S,, Mn) = (Mn — K)* (Lookback option); 
e (Sn, Ma) = Imn >B (Sn — K)* (Knock-in Barrier option). 


Lemma 5.16 The two-dimensional process { (Sp, My) Y;_ is Markov. (Here we are working under 
the risk-neutral measure P, although that does not matter). 


Proof: Fix k. We have 
Mrsi = Mz V Skat, 


renee ; st A 
where V indicates the maximum of two quantities. Let 7 = Baz, so 


P(Z=u) =p, P(Z=d)=4, 
and Z is independent of Fp. Let h(x, y) be a function of two variables. We have 


h(Skti, Meti) = hCSk41, Me V Skta) 
h(Z Sk, Mpg V (ZSk)). 


Define 


I> 


JEh(Zx, y V (Z£)) 
= ph(uzx, yV (uz)) + gh(dz,yV (dz). 


g(x,y) 


The Independence Lemma implies 
E Sit, Miti) Fr] = g (Sr, Mi) = Ph(uS y, Mi V (wSx)) + Gh(dS y, Mr), 


the second equality being a consequence of the fact that My A dS; = Mp. Since the RHS is a 
function of (Sy, My), we have proved the Markov property (form (b)) for this two-dimensional 
process. | 


Continuing with the exotic option of the previous Lemma... Let V denote the value of the derivative 
security at time k. Since (1 + r)~*V; is a martingale under IP, we have 


= —E k=0,1,... — 1. 
Vk l+r VexilF el, 0, ; yn 


At the final time, we have 
Va = Vn (Sn, Mn). 


Stepping back one step, we can compute 


Ll = 
Vn-1 = Tap Elen (Sn, Mn) Fn-1] 


1 sag ~ 
= ar ree [Pun (USn-1, USn—1 V Mn-1) + (Un (dSn-1, M,-1)| y 
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This leads us to define 


Dns lhe By y 
Un-1(@,y) = Lp [Pun (uz, uz V y) + qua(de, y)] 


so that 


Va 1 Un 1( 3x 1, Mn de 
The general algorithm is 


Te fhe - 
vp (2, y) = — | Pups (ua, ua V y) + guk (de, y), 


DFe 
and the value of the option at time k is vg (Sz, My). Since this is a simple European option, the 
hedging portfolio is given by the usual formula, which in this case is 


UR+1 (US, (USK) V My) — veti (085, Mk) 


Az = 
i (u — d) Sy 


Chapter 5 


Stopping Times and American Options 


5.1 American Pricing 


Let us first review the European pricing formula in a Markov model. Consider the Binomial 
model with n periods. Let V,, = g(S,,) be the payoff of a derivative security. Define by backward 
recursion: 


vale) = gle) 
orle) = [Peng (ue) + orsa (de) 


Then vy (Sx) is the value of the option at time k, and the hedging portfolio is given by 


Ukhti (054) — Vk+1 (ASK) 
Ag = ee k=0,1,2,... — 1. 
k (u — d) Si, ; 0, eek yn 


Now consider an American option. Again a function g is specified. In any period k, the holder 
of the derivative security can “exercise” and receive payment g(S¿). Thus, the hedging portfolio 
should create a wealth process which satisfies 


Xx > 9(5%), Vk, almost surely. 


This is because the value of the derivative security at time k is at least g(5,), and the wealth process 
value at that time must equal the value of the derivative security. 


American algorithm. 


un(e) = gle) 
osla) = max { (Poep (ue) + Gorza (de), gle) } 


Then v4 (Sx) is the value of the option at time k. 
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a S,(HH) = 16 v,(16) =0 
a si hs 
Sy(HT) =4 
va (4) =e]. 
ee Sy(TH) = 4 
o AN 
S,(TT) = (1) =4 


Example 5.1 See Fig. 5.1. So = 4, u 


Then 


U1 (8) = 


v1(2) = 


vo (4) = 


max | 5[}0-+ 4.1] (5-8)*} 
a 

ma {5 ($14 9.4] 6 -2)+} 
a 

max {$ [3044 60], 6-0+} 
max{1.36,1} 

1.36 


Let us now construct the hedging portfolio for this option. Begin with initial wealth Xy = 1.36. Compute 


Ao as follows: 


0.40 


= 1(51(H)) 

= S\(H)Ao + (1+ 1r)(Xo — Ad So) 
= 8A0+ (1.36 — 440) 

= 3A02+1.70 => Ap = —0.43 

= v(Sı(T)) 


= Si(T)Ao + (1 + 17) (Xo — ApS0) 
= 2Ao+ (1.36 — 4Ag) 
= —34o0 + 1.70 => Ap = -0.43 
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Using Ay = —0.43 results in 


Xı(H) = vr(S; (H)) = 0.40, Xı (T) = v1 (S1(T)) = 3.00 


Now let us compute Ay (Recall that S1 (7) = 2): 
= va(4) 
= So(TH)A,(T) + (147) (A (T) — Ar(7)5, (T)) 
= 4A,(T)+ 26 — 2A1 (1) 
= 1.5A\(T)+3.75 => A(T) = -1.83 
4 = va(1) 
= SATDA AD + (+r (X(T) — Ai(T)S1(T)) 


5 
= A(T) +2(3-2a.(0)) 
= -1.5A1(7) +3.75 => A(T) = —0.16 
We get different answers for A; (T)! If we had X; (T) = 2, the value of the European put, we would have 
1=15A1(7) +2.5 => A(T) =-1, 


4=-15A1(7) + 2.5 => A(T) =-1, 


5.2 Value of Portfolio Hedging an American Option 


Xk+ı ArSg+1 +(1+r)(Xk — Crk — Ag Se) 


(1+r)Xk + Az(Sk41 (1 | TSE) (1 | r)Ck 


Here, C is the amount “consumed” at time k. 


e The discounted value of the portfolio is a supermartingale. 
e The value satisfies Xy > g(Sk), k =0,1,...,n. 


e The value process is the smallest process with these properties. 


When do you consume? If 
BE (Ltr) vss (Sepa) Fe] < (1+ 7) "un (Se), 


or, equivalently, 


El 


E mkt (Sk+1)| Fh] < vz (Sz) 
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and the holder of the American option does not exercise, then the seller of the option can consume 
to close the gap. By doing this, he can ensure that Xy = v,;(.S;,) for all k, where vz is the value 
defined by the American algorithm in Section 5.1. 


In the previous example, v,(91(T)) = 3, v2(S2(T'H)) = 1 and v2(S2(TT)) = 4. Therefore, 


Eras AO) = = [a+ dl 
4 [5 
= 5la 
= 2 
v(S(T) = 3, 


so there is a gap of size 1. If the owner of the option does not exercise it at time one in the state 
w1 = T, then the seller can consume 1 at time 1. Thereafter, he uses the usual hedging portfolio 


Vk+1(USk) — Vk+1(dSk) 


Ay = 
úl (u — d) Sk 


In the example, we have vı (S1(T)) = g(S1(T)). It is optimal for the owner of the American option 
to exercise whenever its value v¿ (57) agrees with its intrinsic value g(.5;,). 


Definition 5.1 (Stopping Time) Let (Q, F, P) be a probability space and let {F;,};_, be a filtra- 
tion. A stopping time is a random variable 7 : Q->{0,1,2,...,}U {co} with the property that: 


fore OF elo) =k} € Fay Vk =0,1,..., n, œ. 


Example 5.2 Consider the binomial model with n = 2, So = 4,u = 2,d H, r +, sop=q= 2. Let 
Ug, U1, V2 be the value functions defined for the American put with strike price 5. Define 


T(w) = min{k; ox (Sr) = (5 — Sy) TH. 


The stopping time 7 corresponds to “stopping the first time the value of the option agrees with its intrinsic 
value”. It is an optimal exercise time. We note that 


1 ifweAr 
2 ifwe Ay 


{wi Tw) =0} = ¿EFo 
{wirw)=1} = Are Fi 
(wit) =2} = Ane Fe 


Example 5.3 (A random time which is not a stopping time) Inthe same binomial model as in the previous 
example, define 
p(w) = min{k; Sk (w) = malo) y, 
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De tah : : bo : 
where mz = ming<j;<25;. In other words, p stops when the stock price reaches its minimum value. This 
random variable is given by 


1 ifw=TH, 


0 ifw € Ap, 
2 ifw=TT 


{w;p(w)=0} = Ang Fo 
{w;p(w) =1} = {TH} EF 
{w; p(w) =2} = {IT} EF» 


5.3 Information up to a Stopping Time 


Definition 5.2 Let 7 be a stopping time. We say that a set A C Q is determined by time T provided 
that 
AN {w;7(w) = k} € Fx, Vk. 


The collection of sets determined by 7 is a c-algebra, which we denote by 7 ,. 
Example 5.4 In the binomial model considered earlier, let 
T = min{k; ve (Sk) = (5 — Se) TH, 


f1 tee Ap 
TW) =) 3 ifw € Ag 


The set {HT } is determined by time 7, but the set {T H } is not. Indeed, 


{HT} {w;r(w) =0} = $E€ Fo 
{HT} N{w;7rw)=1} = EF, 
{HT} {w;7rw)=2} = {HT} € Fo 


but 
{TH} {w;r(w) = 1} = {TH} E Fi. 
The atoms of F, are 
{HT}, {HH}, Ap ={TH,TT}. 
| 


Notation 5.1 (Value of Stochastic Process at a Stopping Time) If (Q, F, P) is a probability space, 
(Fr) y is a filtration under F, {X;,}7_ 9 is a stochastic process adapted to this filtration, and 7 is 
a stopping time with respect to the same filtration, then X, is an F,-measurable random variable 
whose value at w is given by 
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Theorem 3.17 (Optional Sampling) Suppose that {Y;,, Fi}? (or {Yk, Fr Yg-o) is a submartin- 
gale. Let T and p be bounded stopping times, i.e., there is a nonrandom number n such that 


T<n, p <n, almost surely. 


Ifr < p almost surely, then 
Y, < E le). 


Taking expectations, we obtain IEY, < JEY,, and in particular, Yo = IEYo < EY. IA Ya, Frio 
is a supermartingale, then T < p implies Y, > IE(Y,|F,). 
IFA Yi, Fiz, is a martingale, then T < p implies Y, = E (Y | F+). 


Example 5.5 In the example 5.4 considered earlier, we define p(w) = 2 for all w € Q. Under the risk-neutral 
probability measure, the discounted stock price process (3) Sk is a martingale. We compute 


JORDI 


The atoms of F, are {H H }, {HT}, and Ar. Therefore, 
4 
» (HH) = (=) So(HH), 


JOL 


E DE F| (HT) = : Sa(HT), 
and for w € Ar, 
E BE z. (w) = 4 (sacra +, (2) san) 


$ x 2.564 5 x 0.64 


In every case we have gotten (see Fig. 5.2) 


z. (w) = (DO sto) 
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(4/5) S (B) = 6.40 


ae S,(HH) = 10.24 


16/25) Sy(HT) = 2.56 
S = 
ÓN S,(TH) = 2.56 
(4/5) S, (T) = A 
(16/25)S,(TT) = 0.64 


Figure 5.2: Illustrating the optional sampling theorem. 
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Chapter 6 


Properties of American Derivative 
Securities 


6.1 The properties 


Definition 6.1 An American derivative security is a sequence of non-negative random variables 
{Gi} Ro Such that each G, is Fy-measurable. The owner of an American derivative security can 
exercise at any time k, and if he does, he receives the payment G'g. 


(a) The value V; of the security at time k is 
V; = max (1 +1) MELO +r)7G,|Fr), 
where the maximum is over all stopping times 7 satisfying 7 > k almost surely. 
(b) The discounted value process {(1 + r)~*V;,}%_, is the smallest supermartingale which satisfies 


Vk > Gy, Vk, almost surely. 


(c) Any stopping time 7 which satisfies 
Vo = E[( + r) 7G] 
is an optimal exercise time. In particular 
rf min{k; Vk = Gk} 
is an optimal exercise time. 
(d) The hedging portfolio is given by 


Z Vega (Wry. Wy H) — Vega (Wi, -- ey T) 


A whe = 
(or, rwr) SRW ioe ly, H) Skya lwr 0D) 
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(e) Suppose for some k and w, we have V;(w) = GFy(w). Then the owner of the derivative security 
should exercise it. If he does not, then the seller of the security can immediately consume 


Vito) = BVa Fel) 


and still maintain the hedge. 


6.2 Proofs of the Properties 


Let (Gr) r_, be a sequence of non-negative random variables such that each G'g is ¥),-measurable. 
Define Ty, to be the set of all stopping times 7 satisfying k < T < n almost surely. Define also 


Ve È (1 + r)" max E [(1 +1) "Go Fr]. 


TELE 


Lemma 2.18 V, > Gk for every k. 


Proof: Take 7 € Ty, to be the constant k. | 


Lemma 2.19 The process {(1 + 7)~*V,}%_, is a supermartingale. 
Proof: Let 7* attain the maximum in the definition of Vi.41, 1.e., 

er = E [(1 4 r) Gel Fea). 
Because 7* is also in Tk, we have 


EIL + r OV Fe] = E [EA + r) GFF] 


= El+r)"G,+|F;] 
max JE [(1 + r)-7G,|F x] 
TET, 


lA 


(1 + r) PVE: 


Lemma 2.20 If (Y), is another process satisfying 
Y. > Gy, k =0,1,...,", as., 
and {(1+r)~*¥;,}%_, is a supermartingale, then 


Yk > Ve, k=0,1,...,n, as. 
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Proof: The optional sampling theorem for the supermartingale {(1 + r)~"¥;,}?_, implies 


EL +r) YF] < (+r) EYR, Vr € Th. 


Therefore, 
Ve = (+r) max E[( +r) G,|F] 
TET, 
< (1+r)maxlE[( +r) "Y, F] 
TET, 
< (L+r)-*§d+r)FYy, 
= 
a 
Lemma 2.21 Define 
hs 
= VW- — E 
Ck Ve oe [Vk] Fk] 


= (1er (er) EI + r) Vig |Full} 


Since {(1 + r) EV; } o is a supermartingale, Cy, must be non-negative almost surely. Define 


= Vesti (1, Sei „wk, H) = Vesti (1, kaa „wk, T) 
Skya (w1, os Wk, H) =z Skya (1, ses Wh, T) 


Set Xo = Vo and define recursively 
Xk = Ag S41 + (1 + r)(Xk m Or = ArSh). 


Ar(w1,..., Wk) 


Then 
Xk = Vp Vk. 


Proof: We proceed by induction on k. The induction hypothesis is that Xy = V; for some 
k € {0,1,...,n — 1}, i.e., for each fixed (w1,... , wg) we have 

Xplcr, sE y Wk) = Vilos, ags Wh). 
We need to show that 


Xp (w1, aan „wk, H) = Vki (wi, eri „wk, H), 


Xi (w1, ... ¿Un T) = Vesti (wi, ... „Wk T). 
We prove the first equality; the proof of the second is similar. Note first that 
Ve (wr, PE , Wh) Ts Cr(wr, sate y Wk) 


1 — 
= Tae T EW itl Fr] lon, es coi) 


1 o x 
= Tp PVW- wk H) + Vrti lwr- 4k, T)). 
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Since (w1, ... ,wk) will be fixed for the rest of the proof, we will suppress these symbols. For 
example, the last equation can be written simply as 


1 A $ 
Vi = Or = ee (PVr+1(H) + Vir (T)) : 


We compute 


Xk+ (H) = Ari (H) + (1 +7) (X% — Ck — AkSk) 
_ Vert) — Veyi (T) 
= St) ~ Sea) (Su41(H) — (14 1)54) 
+(+ r) (Vs — Cy) 
= aio pin (us; — (1+ r)Sk) 
HEV (H) + GV 41 (T) 
= (Veya (E) — Vey (T)) i + Vrt (H) + Visi (T) 


= Fpa H): 


6.3 Compound European Derivative Securities 


In order to derive the optimal stopping time for an American derivative security, it will be useful to 
study compound European derivative securities, which are also interesting in their own right. 


A compound European derivative security consists of n + 1 different simple European derivative 
securities (with the same underlying stock) expiring at times 0, 1,... , n; the security that expires 
at time j has payoff C';. Thus a compound European derivative security is specified by the process 
{C4} =o, Where each C; is F¿-measurable, i.e., the process {Cj }'_9 is adapted to the filtration 
{F ko" 

Hedging a short position (one payment). Here is how we can hedge a short position in the 7’th 
European derivative security. The value of European derivative security 7 at time k is given by 


VP = (14) EI +r CIF], k=0,... 53, 
and the hedging portfolio for that security is given by 
VO (wr, 12. Wk, H) — VO (wr, sing T) 
Si) (wr, ee. Wks H) E SY (w, tee Wp, T) 
Thus, starting with wealth ye ) and using the portfolio (AŬ ) CENI A 1 
time 7 we have wealth C’;. 


AY (wy, 22. ,we) = IO 7H 


), we can ensure that at 


Hedging a short position (all payments). Superpose the hedges for the individual payments. In 
other words, start with wealth Vo = M0 ver), At each time k € {0,1,...,n — 1}, first make the 
payment Cy and then use the portfolio 


Ax = ATD + A, (*t?) Ste eset y A,” 
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corresponding to all future payments. At the final time n, after making the final payment C’,,, we 
will have exactly zero wealth. 


Suppose you own a compound European derivative security{C O Compute 


n 


Vo = Y Y = E E (l+r)C; 
¿=0 


¿=0 


and the hedging portfolio is {A LOS You can borrow Vo and consume it immediately. This leaves 
you with wealth Xy = —Vo. In each period k, receive the payment Cy and then use the portfolio 
—Ay. At the final time n, after receiving the last payment C'n, your wealth will reach zero, i.e., you 
will no longer have a debt. 


6.4 Optimal Exercise of American Derivative Security 


In this section we derive the optimal exercise time for the owner of an American derivative security. 
Let (G;)_¿ be an American derivative security. Let r be the stopping time the owner plans to 
use. (We assume that each G% is non-negative, so we may assume without loss of generality that the 
owner stops at expiration — time n- if not before). Using the stopping time 7, in period 7 the owner 
will receive the payment 

Cp = Tay Gy 


In other words, once he chooses a stopping time, the owner has effectively converted the American 
derivative security into a compound European derivative security, whose value is 


ve) E > GE IG 


E =G; 


Il 
© 


= E1 +r) 7G]. 


The owner of the American derivative security can borrow this amount of money immediately, if 
he chooses, and invest in the market so as to exaclty pay off his debt as the payments {C; Fo are 


received. Thus, his optimal behavior is to use a stopping time r which maximizes ve"), 
Lemma 4.22 vir) is maximized by the stopping time 
7 = min{k; Vk = Gk}. 


Proof: Recall the definition 


A T =i = (7) 
E E em eae 


90 


Let T’ be a stopping time which maximizes ver), i.e., Vo = E [a + N Gn] . Because {(1 + r) *V¿H_o 
is a supermartingale, we have from the optional sampling theorem and the inequality Vg > Gz, the 
following: 


Vo 


IV 
S 
= 
| 


Er) V| Fo] 
= El(l+r)"V,] 


Therefore, 


and 
Vz = Gy, a.s. 


We have just shown that if 7’ attains the maximum in the formula 


Vo = max E[(1+r)-7G;], (4.1) 
TETO 
then 
Vi = Gy, a.s. 


But we have defined 
T* = min{k; Vk = Gk}, 


and so we must have 7* < 7' < n almost surely. The optional sampling theorem implies 


(1 +r)" Gy (1 +r) TV,» 


E [1+ r) VF] 


IV 


Er GF). 


Taking expectations on both sides, we obtain 
E |(it+r)-"G] > E [04r Ge] = Vo. 


It follows that 7* also attains the maximum in (4.1), and is therefore an optimal exercise time for 
the American derivative security. a 


Chapter 7 


Jensen’s Inequality 


7.1 Jensen’s Inequality for Conditional Expectations 


Lemma 1.23 If y : IR> is convex and IE|p(X)|< œ, then 
EElp(X)|9] > eUE[X|9]). 
For instance, if G = {¢, 0), (£) = x?: 


IEX? > (EX. 


Proof: Since y is convex we can express it as follows (See Fig. 7.1): 


ple) = max A(z). 


h<e 
h is linear 
Now let h(x) = ax + b lie below y. Then, 


Elp(X)10] ElaX + |G] 
alE[X|G] +6 


hUELX |G) 


Il IV 


This implies 
d FE eono 


h is linear 


= pUE[X|9)). 
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Figure 7.1: Expressing a convex function as a max over linear functions. 


Theorem 1.24 /f {Y;,}/_, is a martingale and $ is convex then {p(Y;) Y, is a submartingale. 


Proof: 


Elo Yir FA] 2 eUETYe+1|Fe]) 
= ou): 


7.2 Optimal Exercise of an American Call 


This follows from Jensen’s inequality. 


Corollary 2.25 Given a convex function g : [0,00) >1R where g(0) = 0. For instance, g(x) = 
(x — K)* is the payoff function for an American call. Assume that r > 0. Consider the American 
derivative security with payoff g (Sx) in period k. The value of this security is the same as the value 
of the simple European derivative security with final payoff g (Sn), i.e., 


B[(+r)-"g(S,)] = max B[(1+r)-79(S,)), 


where the LHS is the European value and the RHS is the American value. In particular T = n is an 
optimal exercise time. 


Proof: Because g is convex, for all A € [0, 1] we have (see Fig. 7.2): 

gAs) = g(de+ (1-0) 
Mía) + (1 A).9(0) 
Ag(z). 


lA 
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(x,8(x)) 
(Ax, A g(x)) 


\ x 
(Ax, g( Ax) 


Figure 7.2: Proof of Cor. 2.25 


Therefore, 
and 
E [E HADISA] = (+E [a(S 17 
> (+B ly (Ses) IF 


IV 


py 
(1+ r)7*g G F 
= (rs): 


So {(1+r)~*g(S;) }¥_o is a submartingale. Let 7 be a stopping time satisfying 0 < 7 < n. The 
optional sampling theorem implies 


(L+r)-79(Sr) < E[(+r)"g(S,)1F,). 


a) 


+r 


Taking expectations, we obtain 


EA +r)-79(S,)] 


lA 


E (E[(1+r)-"g(Sn)|Fol) 
= E[+r)-"g9(Sn)]. 
Therefore, the value of the American derivative security is 
max Æ [(1+r)-7g(S;)] < E [(1 +1) 95). 


and this last expression is the value of the European derivative security. Of course, the LHS cannot 
be strictly less than the RHS above, since stopping at time n is always allowed, and we conclude 
that T pa 

max Æ [(1 + r) g(5,)] = E [Q +r) "9(5n)]. 
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a ore 
S, (H)=8 
DoS S,(HT) =4 
S = 
wee S2(TH) =4 
f S “ix. 
STDS 


Figure 7.3: A three period binomial model. 


7.3 Stopped Martingales 


Let (Y )_, be a stochastic process and let 7 be a stopping time. We denote by {¥;,-}?_, the 
stopped process 
Year) le), &=0,1,..., n. 
Example 7.1 (Stopped Process) Figure 7.3 shows our familiar 3-period binomial example. 
Define 


1 ifw,=T, 
rw)={ 2 ifu=H 
Then 
S(HH)=16 if w=HH, 
A _ | S(HT)=4 ifw=AT, 
aaral = S g(r) =2 if w=TH, 
S(T) =2 if w=TT. 


Theorem 3.26 A stopped martingale (or submartingale, or supermartingale) is still a martingale 
(or submartingale, or supermartingale respectively). 


Proof: Let (Y, );_¿ be a martingale, and 7 be a stopping time. Choose some k € {0,1,..., n}. 
The set {7 < k} is in Fy, so the set {7 > &k +1} = {r < k}° is also in Fg. We compute 
E Yuso Fi] = HE [Iprcay¥r + Loza Vers Fr] 
= Llo<9Yr + Lose El Yk] Fx] 


La A Ye 
= Year: 
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Chapter 8 


Random Walks 


8.1 First Passage Time 


Toss a coin infinitely many times. Then the sample space (2 is the set of all infinite sequences 
w = (w1,W2,...) of H and T. Assume the tosses are independent, and on each toss, the probability 
of H is >, as is the probability of T. Define 


i if w; =H, 


nle -1 ifw,=T, 


Mo = 0, 


k 
M, = X Y;, k=1,2,... 
j=l 


The process [M¿)2_, is a symmetric random walk (see Fig. 8.1) Its analogue in continuous time is 
Brownian motion. 


Define 
7=min{k > 0;M, = 1). 


If Mp; never gets to 1 (e.g., w = (TTTT ...)), then 7 = œ. The random variable 7 is called the 
first passage time to 1. It is the first time the number of heads exceeds by one the number of tails. 


8.2 7 is almost surely finite 


It is shown in a Homework Problem that {/},}72., and (N¿);2¿ where 


0 -6 
Ng = exp Oi — blog (3) 


¿Mx ( 2 
e? + e7? 
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Figure 8.1: The random walk process My, 


A, ¿8 2 


2 e+ E 


Figure 8.2: Illustrating two functions of € 


are martingales. (Take My = —.S;, in part (i) of the Homework Problem and take 9 = —o in part 
(v).) Since Ny = 1 and a stopped martingale is a martingale, we have 


9 KANT 
1 = ENE = JE ee (=) | (2.1) 


for every fixed 0 € JR (See Fig. 8.2 for an illustration of the various functions involved). We want 
to let k—oœ in (2.1), but we have to worry a bit that for some sequences w € 2, T(w) = œ. 


We consider fixed € > 0, so 


As k=00, 
2 ya (a2) ifr<o, 
e? + e7’ 0 if T= œ 


Furthermore, Mka- < 1, because we stop this martingale when it reaches 1, so 


0 < e Mkar < e? 
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and 


In addition, 


Recall Equation (2.1): 


IE een ( 2 y =1 
e? + e7? 


Letting k—>00, and using the Bounded Convergence Theorem, we obtain 


2 T 
9 = 
For all 9 € (0, 1], we have 
2 T 
0 
0% e (=) Lip ees Sk, 
so we can let 910 in (2.2), using the Bounded Convergence Theorem again, to conclude 
E ie < = =1, 


i.e., 
Ple < œ}=1. 


We know there are paths of the symmetric random walk { Mg}? o which never reach level 1. We 
have just shown that these paths collectively have no probability. (In our infinite sample space Q, 
each path individually has zero probability). We therefore do not need the indicator / {r < œ} in 


(2.2), and we rewrite that equation as 


IE (==) | =e, (2.3) 


8.3 The moment generating function for 7 


Let a € (0, 1) be given. We want to find 6 > 0 so that 
2 
seal ares 


ae’ + ae’ -2=0 


Solution: 


a(e?y? -2e+a=0 
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o l+vV1-—- a? 


Q 


We want € > 0, so we must have e7? < 1. Now 0 < a < 1, so 


0<(1-0a<(1-0a<1=e?, 


l-a<vVl-a?’, 

1-Vl—-a? <a, 

1- y1-— a? 

—— <l 

a 
We take the negative square root: 

-0 1-yl- a? 

a 


Recall Equation (2.3): 


2 7 0 


With a € (0, 1) and 6 > 0 related by 


et S 1-yl- a? 
Q $ 
2 
E. +e?) 
this becomes 
han Paar 
EFE eet; 6.1) 
a 


We have computed the moment generating function for the first passage time to 1. 


8.4 Expectation of 7 


Recall that 
Py eee 
Eo” = =. dal, 
a 
so 
d 
TEY = E(ra™") 
ae ee ee a? 
do a 
1-vV1—-a? 
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Using the Monotone Convergence Theorem, we can let af1 in the equation 


1-vV1—- a? 
a2/1— a?’ 


to obtain 


TET = œ. 


Thus in summary: 


rê min{k; My = 1), 
Pit < œ}=1, 


TET = ow. 


8.5 The Strong Markov Property 


The random walk process AMS 1s a Markov process, 1.e., 


JE | random variable depending only on My41, Myy2,.. | Fx] 


= IE | same random variable |M]. 


In discrete time, this Markov property implies the Strong Markov property: 


JIE | random variable depending only on M,41, Mi42,...| Fe] 


= JE | same random variable | M,]. 
for any almost surely finite stopping time 7. 


8.6 General First Passage Times 


Define 
Ty, Ê miník > 0; Mk = m}, m = 1,2,... 


Then 72 — 7, is the number of periods between the first arrival at level 1 and the first arrival at level 
2. The distribution of 72 — 7, is the same as the distribution of 7, (see Fig. 8.3), 1.e., 


1-y1=02 
Ea = ==, ae (0, 1). 
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Ta- Y 


Figure 8.3: General first passage times. 


For a € (0, 1), 

Ela”|F,] = Ela" |F,] 
= aoe 

(taking out what is known) 
= a Ela?-™|M,, | 

(strong Markov property) 
= a” Ea] 
(M, = 1, not random ) 


Eo adds (=< A) 


Q 


Take expectations of both sides to get 


Ea? = Ea”. ( 
In general, 


8.7 Example: Perpetual American Put 


Consider the binomial model, with u = 2, d = L, W= i, and payoff function (5 — $;)*. The risk 
neutral probabilities are p = >, q= >, and thus 


Sy = Souk, 
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where My, is a symmetric random walk under the risk-neutral measure, denoted by P. Suppose 
So = 4. Here are some possible exercise rules: 


Rule 0: Stop immediately. ro = 0, Vo) = 1. 
Rule 1: Stop as soon as stock price falls to 2, 1.e., at time 
T_1 2 min{k; My = —1). 
Rule 2: Stop as soon as stock price falls to 1, i.e., at time 
T_2 = min{k; My = —2}. 
Because the random walk is symmetric under IP, Tm has the same distribution under IP as the 


stopping time Tm in the previous section. This observation leads to the following computations of 
value. Value of Rule 1: 


ve) = a 

= (5-2 [>] 

= Qian COS 

EE 
2 

Value of Rule 2: 
vies) = pE] 

= 43 


This suggests that the optimal rule is Rule 1, i.e., stop (exercise the put) as soon as the stock price 
falls to 2, and the value of the put is 3 if So = 4. 


Suppose instead we start with Sg = 8, and stop the first time the price falls to 2. This requires 2 
down steps, so the value of this rule with this initial stock price is 


GSJ EOR] =3.(1?=2. 


In general, if Sy = 2 for some j > 1, and we stop when the stock price falls to 2, then j — 1 down 
steps will be required and the value of the option is 


We define 
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If So = 2% for some 3 < 1, then the initial price is at or below 2. In this case, we exercise 
immediately, and the value of the put is 


090) 25-2, ¿=1,0,-1,-2,... 


Proposed exercise rule: Exercise the put whenever the stock price is at or below 2. The value of 
this rule is given by v(2’) as we just defined it. Since the put is perpetual, the initial time is no 
different from any other time. This leads us to make the following: 


Conjecture 1 The value of the perpetual put at time k is v(Sz). 


How do we recognize the value of an American derivative security when we see it? 


There are three parts to the proof of the conjecture. We must show: 
(a) v(Sk) > (5 — Si)" Vk, 
4\k T . 
(b) {(4) AS), is a supermartingale, 
(e) {v(S;,) +2, is the smallest process with properties (a) and (b). 


Note: To simplify matters, we shall only consider initial stock prices of the form Sy = 2, so Sz is 
always of the form 2/, with a possibly different 7. 


Proof: (a). Just check that 


This is straightforward. = 
Proof: (b). We must show that 


v(S;) 


IV 


E |40( Sei) Fa] 
Lol) + EGS). 
By assumption, Sy = 2% for some j. We must show that 

v(2/) > 20(21+1) + 2v(2/-4). 


If j > 2, then v(2’) = 3.(3)/=1 and 
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If j = 1, then v(2%) = v(2) = 3 and 


There is a gap of size $. 


If j < 0, then v(2’) = 5 — 2 and 


2y(2/t1) + 2y(2I-1) 
pp at eee) 
= 4-2(44 1329! 
4 


There is a gap of size 1. This concludes the proof of (b). a 
Proof: (c). Suppose {Y; }7_, is some other process satisfying: 
(a) Yp > (5 — Si)? Vk, 
(b’) PEYI Zo is a supermartingale. 
We must show that 

Yp > v(Sy) Vk. (7.1) 
Actually, since the put is perpetual, every time K is like every other time, so it will suffice to show 

Yo > v(So), (7.2) 


provided we let So in (7.2) be any number of the form 2’. With appropriate (but messy) conditioning 
on Fx, the proof we give of (7.2) can be modified to prove (7.1). 


Forj < 1, l l 
v(2) =5 -2 = (b= 20%, 


so if Sy = 2/ for some j < 1, then (a’) implies 
Yo > (5 = gst = v(Sp). 
Suppose now that So = 2) for some j > 2, i.e., Sy > 4. Let 


T = min{k; Sk = 2} 
= min{k; Mp, =j-1}. 
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Then 


is] 
er. 
Sal 

o 
A 

Il 
Ss 
PE 
ho 
he. 
A 

Il 
area Ww 
a a 
bole 
A 
we 
L 


Because {(2)"¥;,}?2o is a supermartingale 
Yo > E [EY] > B (876 - S,)*] = v(50). 
a 


Comment on the proof of (c): If the candidate value process is the actual value of a particular 
exercise rule, then (c) will be automatically satisfied. In this case, we constructed v so that v(5,) is 
the value of the put at time k if the stock price at time k is Sy and if we exercise the put the first time 


(k, or later) that the stock price is 2 or less. In such a situation, we need only verify properties (a) 
and (b). 


8.8 Difference Equation 


If we imagine stock prices which can fall at any point in (0, 00), not just at points of the form 2/ for 
integers 7, then we can imagine the function v(x), defined for all x > 0, which gives the value of 
the perpetual American put when the stock price is x. This function should satisfy the conditions: 


(a) v(x) > (K —2)T, Ve, 
(b) v(x) >  [poluz) + Go(de)], Va, 
(c) At each z, either (a) or (b) holds with equality. 


In the example we worked out, we have 


For j > 1: v(27) =3.(1)7! = mE 


For j <1: (2?) =5- 2. 


This suggests the formula 


We then have (see Fig. 8.4): 
(a) v(x) > (5-2); Yz, 


b) v(x) >? E + Lo(2)] for every x except for 2 < x < 4. 
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v(x) 


Figure 8.4: Graph of v(x). 


Check of condition (c): 


e If0 < x < 3, then (a) holds with equality. 


e If x > 6, then (b) holds with equality: 


x 6 12 6 
eote] =a [Eo t]: 


elf3 <« < 4or4 < x < 6, then both (a) and (b) are strict. This is an artifact of the 
discreteness of the binomial model. This artifact will disappear in the continuous model, in 
which an analogue of (a) or (b) holds with equality at every point. 
8.9 Distribution of First Passage Times 
Let {Mk $2, be a symetric random walk under a probability measure IP, with Mo = 0. Defining 


7=min{k > 0; Mk = 1}, 


we recall that 


We will use this moment generating function to obtain the distribution of r. We first obtain the 
Taylor series expasion of lE a7 as follows: 
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fe) = 1-vi=s, f(0)=0 

Fe) = 30-7, f=} 

Me) = 70-2), f0)=7 

fo) = ¿0-94 M0= 

o ge 1x3x X (5-3, a 2j—1 | 

pa) = 1x3x = (2j — 3) 
2. A XJI 2 RAK X (27—22) 
_ (uy 25-2)! 
+10) (G- 1)! 


fle) = lex L=oe 


A eee ; 
= y Gf 2! 
J=0 7" 


A a231 (27 - 2)! ; 
E 2 (1) ose 


£ 1) 2—1 27-2 E 
-HEO 


j=2 


So we have 


Eat = 


But also, 


Ea = > oI"! Pir = 27 =i}: 


j=l 
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Figure 8.5: Reflection principle. 


Figure 8.6: Example with j = 2. 


Therefore, 


P{r=1} = 


IP{r = 2j — 1} 


Il 
Ts NIE 
IS 
NA 
bo 
Se 
L 

Peas 

Qo, 

| JR 

E 

A 

PTT 
Nm 
Qo, 
N 

ULA 

Il 

N 

~ 


8.10 The Reflection Principle 


To count how many paths reach level 1 by time 27 — 1, count all those for which M2;—ı = 1 and 
double count all those for which M2;—1 > 3. (See Figures 8.5, 8.6.) 
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In other words, 
PT < 2) = 1} = IP{M;-1 = 1} Te 2IP{Mo;-1 > 3} 
= IP{Mz;-1 = 1} + P{Mz;-1 > 3} + P{Mz;-1 < -3} 
1 — P{Mzj-1 = —1}. 


For 7 > 2, 
Pir =2j7-1} = P{r<2j-1}- P{r < 27-3} 
= [bad Me) 1 P{Mz;-3 = —13] 
= IP{M3z;-3 = —1} = IP{Mo;-1 = —1} 


UY E E reo 

= (H SES wii -)- mi) 
= HQ FESS pies - 2) - ji 
SOMME ce 

- AC) 


Chapter 9 


Pricing in terms of Market Probabilities: 
The Radon-Nikodym Theorem. 


9.1 Radon-Nikodym Theorem 


Theorem 1.27 (Radon-Nikodym) Let P and IP be two probability measures on a space (Q, F). 
Assume that for every A € F satisfying IP(A) = 0, we also have IP(A) = 0. Then we say that 
P is absolutely continuous with respect to P. Under this assumption, there is a nonegative random 
variable Z such that 


P(A) = | zar, VA EF, (1.1) 


and Z is called the Radon-Nikodym derivative of P with respect to P. 


Remark 9.1 Equation (1.1) implies the apparently stronger condition 
EX = E[XZ] 
for every random variable X for which IE|X Z| < oo. 


Remark 9.2 If P is absolutely continuous with respect to P, and P is absolutely continuous with 
respect to JP, we say that P and JP are equivalent. P and JP are equivalent if and only if 


IP(A) =0 exactly when P(A) = 0, VA € F. 


If P and P are equivalent and 7 is the Radon-Nikodym derivative of IP wet. P, then > is the 
Radon-Nikodym derivative of P w.r.t. P, 1.e., 


EX = E[XZ] VX, (1.2) 
EY = EY VY. (1.3) 


(Let X and Y be related by the equation Y = X Z to see that (1.2) and (1.3) are the same.) 
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Example 9.1 (Radon-Nikodym Theorem) Let Q = (HH, HT, TH, TT), the set of coin toss sequences 
of length 2. Let P correspond to probability 3 for H and 5 for T, and let IP correspond to probability > for 


H and $ for T. Then Z(w) = DE, so 


Z(HH) = >, Z(HT)=2, Z(TH) 


9.2 Radon-Nikodym Martingales 


Let Q be the set of all sequences of n coin tosses. Let P be the market probability measure and let 
JP be the risk-neutral probability measure. Assume 


P(w) > 0, P(w) > 0, Ww EQ, 


so that P and IP are equivalent. The Radon-Nikodym derivative of P with respect to P is 


Define the P-martingale 
Ge TEA lc SO dt 
We can check that Z; is indeed a martingale: 
EZF] = EVE [Z| Frl F] 
E(Z\F x] 
= “Lh. 


Lemma 2.28 If X is Fy-measurable, then IEX = JE[X Zz]. 


Proof: 

EX = E[XZ| 
E [EX 2|F 5] 
IE [X.E[Z|F yl] 
= E[XZ;]. 


Note that Lemma 2.28 implies that if X is F }-measurable, then for any A € Fk, 


EllAX] = E[Z,14X], 


or equivalently, 
[x= | X ZpdiP. 
A A 
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Z(HH) = 9/4 
1/3 
Z (H) = 3/2 
13 2/3 
~ Z,(HT) = 9/8 
7 Ps va 7 Z(TH) = 98 
ZT) = 314 
213 
Z(TT) = 9/16 


Figure 9.1: Showing the Zy; values in the 2-period binomial model example. The probabilities shown 
are for P, not IP. 


Lemma 2.29 If X is ¥;,-measurable and0 < j < k, then 


— 1 
PIXA; = SEX AIF. 
J 


Proof: Note first that zE [X Z| F;] is F;-measurable. So for any A € F;, we have 
J 


1 sak 
f z EIX ZFP = f E[XZ,|F;]dIP (Lemma 2.28) 
A Ži A 


f XZķędiP (Partial averaging) 
A 


f XdĪP (Lemma 2.28) 
A 


Example 9.2 (Radon-Nikodym Theorem, continued) We show in Fig. 9.1 the values of the martingale Z% . 
We always have Zo = 1, since 


9.3 The State Price Density Process 


In order to express the value of a derivative security in terms of the market probabilities, it will be 
useful to introduce the following state price density process: 


Ce A+ Fy) EZ. k =0,..., 7. 
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We then have the following pricing formulas: For a Simple European derivative security with 
payoff Cy at time k, 


Vo 


E [(1 + r) Ch] 
E | (14r) ZC] (Lemma 2.28) 
= EC]. 


More generally for 0 < j < k, 


V; = (1+ mE [+ ry CF; | 
_ Ct eatin] (Lemma 2.29) 
Zj 
= LEC 
Gi 


Remark 9.3 {C ¡Vio is a martingale under P, as we can check below: 


EGVV F] = EEC F 44 ]1P 5] 
ECC | Fs] 
= QV; 


Now for an American derivative security {G} }%—o: 


Vo = sup Æ[(1+r) 7G] 
TETO 
= sup El[(1+r)""Z,G;] 
TETO 
= sup E-G. 
TETO 


More generally for0 < j < n, 


V; = (1+r sup E [1 +1)7G,|F;] 
TET; 
; 1 
= (141) sup —F[(1+1r)7Z,G,|F,] 
TEL, j 


1 
= = sup EIC Gr Pl. 
Gi TET} 


Remark 9.4 Note that 


(a) {¢;Vj}7 <0 is a supermartingale under P, 


b) GV; > GG; Vi, 
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E (HH) = 1.44 


> S,(HH) = 16 


6 )(H) = 1.20 7 


We 
S(H)=8 
EHT) = 0.72 
1/3 ES S5(HT) =4 
Si 2/3 1B S,(TH) =4 
raat 


ages (TH) = 0.72 
S(T) =2 5 
4 (D=0.6 
2/3 
. EXIT) = 0.36 
S)(TT) =1 


Figure 9.2: Showing the state price values Cy. The probabilities shown are for P, not P. 


(e) {¢ iVi to is the smallest process having properties (a) and (b). 


We interpret C, by observing that Ç; (w) IP (w) is the value at time zero of a contract which pays $1 
at time k if w occurs. 


Example 9.3 (Radon-NikodymTheorem, continued) We illustrate the use of the valuation formulas for 
European and American derivative securities in terms of market probabilities. Recall that p = 3, q= 3: The 
state price values (7 are shown in Fig. 9.2. 


For a European Call with strike price 5, expiration time 2, we have 


(HH) = 11, Co(HH)Vo(HH) = 1.44 x 11 = 15.84. 
( 


V2(HT) = Vo(TH) = V2(TT) = 0. 
Did 
Va = 5X 5x 15.84 = 1.76. 
(o(HH) 1.44 
SU HH) = <= x11= 1.20 x 11 = 13.20 
an => ó 
1 
Vi(H) = = x 13.20 = 4.40 


Compare with the risk-neutral pricing formulas: 
Vi(H) = ¿VU (HH) + ¿VU (AT) = 2 x 11 = 4.40, 
Vi(T) = 2V (TH) + 2V (TT) = 0, 
Vo = 2Vi(H) + 2Vi(T) = 2 x 4.40 = 1.76. 
Now consider an American put with strike price 5 and expiration time 2. Fig. 9.3 shows the values of 
Cx(5 — Sk)". We compute the value of the put under various stopping times 7: 
(0) Stop immediately: value is 1. 
(1) Ifr(HH)=r7(AT) = 2, r(TH) = (TT) = 1, the value is 


1 
3x 3 Xx 0.724 4 x 1.80 = 1.36. 
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(5 - S(HH))"= 0 
EJHB)(S - S(HH)= 0 


(5- S(H)J = 0 
C(H)(5 - S (H) = 0 
(5- S(HT))"= 1 
CfHT) (5 - S(HT) y 0.72 


En 


1/3 2/3 
(So) (5 - S,(TH))*= 1 
bo (5-So) =1 3213 1/3 C{TH) (5 - S,(TH))*= 0.72 


(5 - S(T)" = 3 
ET) (5 - S(T) = 1.80 
2/3 (5-S(TI)*=4 
C{TT) (5 - S,(TT)) t- 1.44 


Figure 9.3: Showing the values ¢ (5 — Sk)™ for an American put. The probabilities shown are for 
P not P. 


(2) If we stop at time 2, the value is 


1 
x 3 X0.724 3x 3 x0.7243 x 3 x 1.44=0.96 


w| — 


We see that (1) is optimal stopping rule. E 


9.4 Stochastic Volatility Binomial Model 


Let 2 be the set of sequences of n tosses, and let 0 < dy < 1+rk < ug, where for each k, dk, Uk, rp 
are F y. -measurable. Also let 


~ l+rk-dk `-  u—(L+rx) 
Pk = — n, dk = _: 
uk — dk Up — dk 


Let IP be the risk-neutral probability measure: 
Ploy = H} = Po: 
P{wy = T} = do, 
and for2 <k <n, 


Plur = HF] = Be, 


Pa == 


Let P be the market probability measure, and assume P{w} > 0 Vw € Q. Then P and IP are 
equivalent. Define 


10 = EL Wo ea: 
Ww 
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My = (1+ rk-1)Mk-1, AI 


Note that Mg is Fy,_ ¡ -measurable. 


We then define the state price process to be 
k i ly, ky > ; 


As before the portfolio process is {A Hi. The self-financing value process (wealth process) 
consists of Xo, the non-random initial wealth, and 


Xk41 = Ari + (1H ri) (Xk — AgSg), k=0,...,n— 1. 
Then the following processes are martingales under P: 
1 8 1 pS 
Gag ae 
and the following processes are martingales under P: 
iS Ho and {CpXk}R=o- 
We thus have the following pricing formulas: 


Simple European derivative security with payoff C; at time k: 


ON 
y= ME Lo 
1 
= SE [Cr F 3] 
Gi 
American derivative security {G} go: 
V = M; sup E “lz; 
TET; T 


1 
= = sup IE (¢-G-|F 5] f 
G TET; 


The usual hedging portfolio formulas still work. 
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9.5 Another Applicaton of the Radon-Nikodym Theorem 


Let (Q, F, Q) be a probability space. Let G be a sub-o-algebra of F, and let X be a non-negative 
random variable with [y X dQ = 1. We construct the conditional expectation (under Q) of X 
given G. On G, define two probability measures 


IP(A) =Q(A) VAEG; 


P(A) = | xaQ VA EG. 


Whenever Y is a G-measurable random variable, we have 


py aps Ly aa; 


if Y = 1, for some A € G, this is just the definition of JP, and the rest follows from the “standard 
machine”. If A € G and P(A) = 0, then Q(A) = 0, so IP(A) = 0. In other words, the measure IP 
is absolutely continuous with respect to the measure IP. The Radon-Nikodym theorem implies that 
there exists a G-measurable random variable 7 such that 


Pay? | zar VA EG, 


[X= | zar VA EG. 


This shows that Z has the “partial averaging” property, and since Z is G-measurable, it is the con- 
ditional expectation (under the probability measure (2) of X given G. The existence of conditional 
expectations is a consequence of the Radon-Nikodym theorem. 


Chapter 10 


Capital Asset Pricing 


10.1 An Optimization Problem 


Consider an agent who has initial wealth Xo and wants to invest in the stock and money markets so 


as to maximize 
IE log Xn. 


Remark 10.1 Regardless of the portfolio used by the agent, {C} Xy 72, is a martingale under P, so 
IEG, Xn = Xo (BC) 
Here, (BC) stands for “Budget Constraint”. 


Remark 10.2 If € is any random variable satisfying (BC), i.e., 
ECE = Xo, 


then there is a portfolio which starts with initial wealth Xo and produces X,, = € at time n. To see 
this, just regard € as a simple European derivative security paying off at time n. Then Xo is its value 
at time 0, and starting from this value, there is a hedging portfolio which produces X,, = £. 


Remarks 10.1 and 10.2 show that the optimal X,, for the capital asset pricing problem can be 
obtained by solving the following 

Constrained Optimization Problem: 

Find a random variable € which solves: 


Maximize JE log € 


Subject to JEC,£= Xo. 


Equivalently, we wish to 
Maximize 5 (log €(w)) P(w) 
wef) 
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Subjectto Y ¢,(w)€(w)P(w) — Xo =0. 
we 


There are 2” sequences w in Q. Call them w1, w2, ... , wn. Adopt the notation 
tı = (w1), Ta = Elwa), ... y TOMS E(won). 


We can thus restate the problem as: 


Die 
Maximize > (log Lp) IP(w x) 
k=1 
Die 
Subjectto Y Galw) Plor) — Xo = 0. 
k=1 


In order to solve this problem we use: 


* 


Theorem 1.30 (Lagrange Multiplier) /f(xí,... , xž,) solve the problem 


Maxmize f(%1,...,Um) 
Subject to g(%1,..., £m) = 0, 
then there is a number A such that 
o * > o x * 
IAN a E Mp OP its ae Nis kS dled as (1.1) 
and 
Giese 2%) A (1.2) 


For our problem, (1.1) and (1.2) become 


1 
ae Ps) SN A k= 1,...,2”, (la) 
k 
Die 
Cn (wat Plor) = Xo. (1.2) 
k=1 
Equation (1.1”) implies 
f 1 
TE (or) 
Plugging this into (1.2’) we get 
ie 1 
Y P(w) = Xo 7 = Xo 
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Therefore, 


Thus we have shown that if £* solves the problem 


Maximize I logé 


Subjectto  JE(C,€) = Xo, (1:3) 
then 
Xo 
es (1.4) 
Gn 
Theorem 1.31 Jf &* is given by (1.4), then &* solves the problem (1.3). 
Proof: Fix Z > 0 and define 
f(z) = log x — xZ. 
We maximize f over x > 0: 
j 1 
fi(e)=--Z=0 4> r=, 
x 
eo ce 0, Y IR 
The function f is maximized at «* = >, i.e., 
1 
logz — £Z < f(a*) = log 77 1, Va > 0, YZ >0. (1.5) 


Let € be any random variable satisfying 


and let 


From (1.5) we have 


Taking expectations, we have 


Blogg (GE) < Blog — 1, 


and so 
Flog € < Flog &. 
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In summary, capital asset pricing works as follows: Consider an agent who has initial wealth Xo 
and wants to invest in the stock and money market so as to maximize 


IE log Xn. 


The optimal X,, is Xn = A Le., 
CN = Xo. 


Since {Xk g-o is a martingale under P, we have 
Ck Xk = E[n Xn] PF = Xo, k HU ¿2 


so 


and the optimal portfolio is given by 


Xo Es Xo 
Aj (w1 wr) A Cr (01, -Wk H) k41(07],... ,0É,T) 
TEF SS E a E L KA 
Sky (0, Sek „wk, H) => Shar (wr, ens Wk, T) 


Chapter 11 


General Random Variables 


11.1 Law of a Random Variable 


Thus far we have considered only random variables whose domain and range are discrete. We now 
consider a general random variable X : (Q—>/R defined on the probability space (Q, F, P). Recall 
that: 


e F isao-algebra of subsets of Q. 
e Pisa probability measure on F, i.e., IP(A) is defined for every A € F. 


A function X : £2>/R is a random variable if and only if for every B € BUR) (the o-algebra of 
Borel subsets of R), the set 


{X ¢ B}2.X71(B) 2 fu: X(w) € BY EF, 


i.e., X : Q—>R is a random variable if and only if X~! is a function from B(JR) to F(See Fig. 
11.1) 


Thus any random variable X induces a measure yy on the measurable space (JR, B(IR)) defined 
by 
px (B) = P (X7U(B)) YB € BUR), 


where the probabiliy on the right is defined since X7*(B) € F. px is often called the Law of X — 
in Williams” book this is denoted by £x. 


11.2 Density of a Random Variable 
The density of X (if it exists) is a function fy : IR-[0, 00) such that 
renee f fx(e) de YB BUR). 
B 
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{X € B} Q 


Figure 11.1: /llustrating a real-valued random variable X. 


We then write 

duy (a) = fx(x)dz, 
where the integral is with respect to the Lebesgue measure on R. fx is the Radon-Nikodym deriva- 
tive of ux with respect to the Lebesgue measure. Thus X has a density if and only if wx is 
absolutely continuous with respect to Lebesgue measure, which means that whenever B € BUR) 


has Lebesgue measure zero, then 
IP{X € Bj =0. 


11.3 Expectation 


Theorem 3.32 (Expectation of a function of X) Let h : IR-JR be given. Then 


Enx) ê de h(X (00) dIP(w) 

= f ha) dux(e) 
R 

= f kr da 
R 

Proof: (Sketch). If h(x) = 1g (x) for some B C IR, then these equations are 
Elo S Pix eb 
= ux(B) 
= f fx (a) de, 
B 


which are true by definition. Now use the “standard machine” to get the equations for general h. 
a 
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(X,Y) 


{ (X Y)e C} Q 


Figure 11.2: Two real-valued random variables X, Y . 


11.4 Two random variables 


Let X,Y be two random variables (2>/R defined on the space (Q,7,P). Then X,Y induce a 
measure on B(/R?) (see Fig. 11.2) called the joint law of (X,Y), defined by 


uxy (C) Ê P{(X,Y) € C} YC € B(R?). 
The joint density of (X,Y) is a function 


Fx Y ; R?—[0, 00) 


that satisfies 


uxy (C) = ff txy (2,9) dedy YC € BUR?). 
C 


fx y is the Radon-Nikodym derivative of ux y with respect to the Lebesgue measure (area) on JR?. 


We compute the expectation of a function of X, Y in a manner analogous to the univariate case: 


ERY) E f 1(X(0),Y(0)) Pw) 


/ k(x, y) dux y (2, y) 


J k(z,y)fxy(z,y) dedy 
R2 
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11.5 Marginal Density 


Suppose (X, Y) has joint density fx y. Let B C IR be given. Then 


uy(B) = IP(Y €B} 
= P((X,Y) € Rx B} 
= px y (UR x B) 


[ a Fx y (a, y) dedy 
sl fy (Y) dy, 


2 if Fx y (x, y) dx 


Therefore, fy (y) is the (marginal) density for Y . 


where 


11.6 Conditional Expectation 


Suppose (X,Y) has joint density fx y. Let h : IR-JR be given. Recall that JE[h(X)|Y] 2 
IE|h(X)|o(¥)] depends on w through Y , i.e., there is a function g(y) (g depending on h) such that 


EX) Y] (0) = gY (o). 


How do we determine g? 


We can characterize g using partial averaging: Recall that A € o(Y)<>A = {Y € B} for some 
B € BUR). Then the following are equivalent characterizations of g: 


Jun diP = Ju dP YA € o(Y), (6.1) 
A A 

A 1n(Y)g(Y) dP = a 1p(Y)h(X) dP VB € BUR), (6.2) 
f 15(y)g(y)uy (dy) = JI 1g(y)h(z) dux y(z,y) YB € BUR), (6.3) 
R 


f, gy) fr (y) dy = [fh 2) fxy(2,y)dedy VB € BUR). (6.4) 
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11.7 Conditional Density 


A function fx ¡y (aly) : IR?—[0, 00) is called a conditional density for X given Y provided that for 
any function A : IR IR: 


gW) = f(x) Fxyv (ely) de. 7.1) 


(Here g is the function satisfying 
IE[A(X)|¥] = 9(¥), 


and y depends on A, but fx ¡y does not.) 
Theorem 7.33 If (X,Y) has a joint density fx y, then 


Fx y (z, y) 
Fry) - 


Proof: Just verify that g defined by (7.1) satisfies (6.4): For B € BUR), 


f, ene 2) fxjy (zly) de fy (y) dy = f, fr x) fx y (2, y) dedy. 


a | as 


fx ¡y (ely) = (7.2) 


Notation 11.1 Let g be the function satisfying 
E[h(X)|Y] = g(¥). 
The function g is often written as 
g) = E[A(X)|Y = yl, 
and (7.1) becomes 
ENQOY=91= f ble) Fup (ely) de 


In conclusion, to determine JE[A(X)|Y] (a function of w), first compute 


gw) = | 1) fx (ely) de 
and then replace the dummy variable y by the random variable Y : 


EX) Y] (0) = y (Y (o). 


Example 11.1 (Jointly normal random variables) Given parameters: o1 > 0,02 > 0,—1 < p< 1. Let 
(X, Y ) have the joint density 


1 £ ey y 
fx y (u,y “== 7 [5-2 SA | hey 
(zy) 2ra / l1 — p? 2(1— p?) Lo? Po T2 03 
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The exponent is 


We can compute the Marginal density of Y as follows 


2 2 
1 Sees 5 2 (z ery) - 15 
fra) = —— | AURA me dee 2% 
Y) 2101091 p? J- 
oo 2 = begs 
= : f eT due 2% 
2702 Joo 
ing the substituti = 327 (e - Ly), du = 2 
using the substitution u a a— Ay), du EPA 
1 y? 
gee ee, 
V2T 0 
Thus Y is normal with mean 0 and variance 3. 
Conditional density. From the expressions 
= ae (pba ee 
Fx y (x,y) = —— Aé 2(1-p?) E (o oe y) e 203, 
? 2101094 1-— p? 
1 y? 
fr(y) =e PF, 
V2T 02 
we have 
fxy (x,y) 
fxyy(ely) = = 
| 0) 
1 1 
= 1 ~ 30"? d (x - ty) 


—— — e 
Y2701 \/1— p? 


In the x-variable, fx ¡y (x|y) is a normal density with mean 2y and variance (1 — p*)oj. Therefore, 


IE(X|Y = y] >i afxiy (ely) de = TA 


2 
i [(x- 20) F= 
02 


oo 2 
f (> = 22) Fxyy (ely) de 
-00 2 


(1 p*)o7. 
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From the above two formulas we have the formulas 


E[x|Y] = Uy, (7.3) 
02 
o 2 
E (x z miy) y = (1—p")o?. (1.4) 
T2 
Taking expectations in (7.3) and (7.4) yields 
EX = PZ EY =0, (1.5) 
72 
F 2 
E (x = miy) = (1—p?)o?. (7.6) 
02 


Based on Y', the best estimator of X is aa This estimator is unbiased (has expected error zero) and the 
expected square error is (1 — p?)07. No other estimator based on Y can have a smaller expected square error 
(Homework problem 2.1). E 


11.8 Multivariate Normal Distribution 


Please see Oksendal Appendix A. 


Let X denote the column vector of random variables (X1, X2,..., X ae and x the corresponding 
column vector of values (#1, %2,..., En). X has a multivariate normal distribution if and only if 
the random variables have the joint density 


fx(x)= Sy exp {4 \T.A.(X = 1)) ] 


Here, 
A T A T 
HZ isla?” = EX = ME o o ERGO 


and A is an n x n nonsingular matrix. 47! is the covariance matrix 
At = E (Xp) A 37), 


i.e. the (7, ¿)th element of 47? is IE (X;— mi) (X; — u;). The random variables in X are independent 
if and only if 47! is diagonal, i.e., 


a aa 2 
A`™ = ae (05205 02-405 NG 


where o? = JE(X; — pus)? is the variance of X;. 
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11.9 Bivariate normal distribution 


Take n = 2 in the above definitions, and let 


a E(X1 ~ ta) (Xo = ba) 


pP 
0102 


Thus, 
2 


Ao | 01 pO102 | 


2 
pa102 05 


102 (1=p?) o3(1—p?) 
1 

ydet A = ———_., 

0102\/1 — p? 


and we have the formula from Example 11.1, adjusted to account for the possibly non-zero expec- 
tations: 


sol- l EAS 


21,2) = —— = TAR 
fx, Xa (01, 02) 20 0711 — p? 1 — p?) o? 0102 o3 


11.10 MGF of jointly normal random variables 


Let u = (u1, Wass oe Un)” denote a column vector with components in JR, and let X have a 
multivariate normal distribution with covariance matrix A~! and mean vector u. Then the moment 
generating function is given by 


T Q pO. T 
Eet X = oe cab ie ee Ce eee 
a {4 T 4-1 T \ 
= pygu u+u pe. 


If any n random variables X1, X2,...,X,, have this moment generating function, then they are 
jointly normal, and we can read out the means and covariances. The random variables are jointly 
normal and independent if and only if for any real column vector u = (uy,..., Un)? 


T A n n 
Ee” DIE IF exp E 05) = exp {Sete +m . 


j=l j=l 


Chapter 12 


Semi-Continuous Models 


12.1 Discrete-time Brownian Motion 


Let {Y;}%_, be a collection of independent, standard normal random variables defined on (Q, F, P), 
where P is the market measure. As before we denote the column vector (Y1,..., Yn)? by Y. We 


therefore have for any real colum vector u = (w1,...,Un)/, 


Ket Y = JE exp {eel = exp 


j=l 


n 
12 
j=l 


Define the discrete-time Brownian motion (See Fig. 12.1): 


Bo = 0, 


k 
Bs S Yp deS dii m: 
j=l 


If we know Y1, Yo,... , Yp, then we know B1, Bz,... , By. Conversely, if we know B1, Bo,... 


then we know Yı = Bi, Yo = Bz — B,,...,Y, = By — B,_ 1. Define the filtration 


Fo = {4,9}, 
Fr, = o (Yi, Y2,..., Yk) = o (B1, B2,... Bl te aa 


Theorem 1.34 (B;,)_, is a martingale (under P). 


Proof: 

IE [Bky1| Fk] = ElYeri + Bel Fk] 
IE Ypk41 + Bk 
= Bi 
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, Br, 
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Figure 12.1: Discrete-time Brownian motion. 


Theorem 1.35 {Bz };—o is a Markov process. 


Proof: Note that 
E[h(Bky1)| Fk] = JEIR(Yiya + Br)| Fr]. 


Use the Independence Lemma. Define 


gb) = IEh(Yey1 + b) (y +b)e 31 dy. 


a 


Then 
E[h(Yk+1 + BE) F 5] = g(Br), 


which is a function of By alone. 


12.2 The Stock Price Process 


Given parameters: 


e u € IR, the mean rate of return. 
e o > 0, the volatility. 


e So > 0, the initial stock price. 
The stock price process is then given by 
a 1 2 = 
Sk = So exp {0 Br + (u — 50 kh, k=0,...,n. 


Note that 
Sk+1 = Sk exp {oYes1 tlie = 10), 
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E[SpalFe] = SEO [Fy]. 27 
= Sper zo 
= eS. 
Thus 
b= log Pele = log JE a Fi ; 
and 


var (los Stn) = var (ois + (u - 4a)) =o’, 
k 


12.3 Remainder of the Market 


The other processes in the market are defined as follows. 


Money market process: 
M; =e"*, k=0,1,...,n. 


Portfolio process: 


e Ao, Ai, es pags 


e Each A; is F;,-measurable. 
Wealth process: 


e Xo given, nonrandom. 


Xk = ApSypi +E (Xk — AgSg) 
Ax (S44 — e Sk) +e Xy 


e Each Xy is F¡-measurable. 


Discounted wealth process: 


Xt Sh+1 Sk X; 
Mar Mi) + 


12.4 Risk-Neutral Measure 


PE 12.1 Let P be a probability measure on (Q, F), equivalent to the market measure P. If 


(de Me E is a martingale under P, we say that P is a risk-neutral measure. 
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n 


Theorem 4.36 If P is a risk-neutral measure, then every discounted wealth process {at}, is 
=0 


a martingale under P, regardless of the portfolio process used to generate it. 


Proof: 


ae | Xk+ _ Sea Sk Xk 

Á ine Fi | E [as (a 7 54) + ui A+ 
= Ta | Skt Sk Xk 
z a (El Fe] $) + te 
_ Xx 


12.5 Risk-Neutral Pricing 


Let V,, be the payoff at time n, and say it is F,,-measurable. Note that V,, may be path-dependent. 
Hedging a short position: 


e Sell the simple European derivative security V». 
e Receive Xo at time 0. 
e Construct a portfolio process Ap,... , A, 1 which starts with Xo and ends with X,, = Vn. 


e If there is a risk-neutral measure P, then 
ES — Xn, a — Vn 
Xo = ET = Eyi 


Remark 12.1 Hedging in this “semi-continuous” model is usually not possible because there are 
not enough trading dates. This difficulty will disappear when we go to the fully continuous model. 


12.6 Arbitrage 

Definition 12.2 An arbitrage is a portfolio which starts with Xo = 0 and ends with X, satisfying 
P(X, > 0)=1, P(X, > 0)>0. 

(P here is the market measure). 


Theorem 6.37 (Fundamental Theorem of Asset Pricing: Easy part) Jf there is a risk-neutral mea- 
sure, then there is no arbitrage. 
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Proof: Let IP be a risk-neutral measure, let X = 0, and let X, be the final wealth corresponding 


to any portfolio process. Since (E y is a martingale under P, 
=0 


ap = Esp =0. (6.1) 


Suppose IP(X,, > 0) = 1. We have 


P(X, > 0) = 1 => P(X,„ < 0) = 0 => P(X, < 0) = 0 = PX, 2 0) = 


(6.2) 
(6.1) and (6.2) imply P (Xn = 0) = 1. We have 
P(X, = 0) = 1 => P(X, > 0) = 0 => P(Xn > 0) = 0. 

This is not an arbitrage. E 
12.7 Stalking the Risk-Neutral Measure 
Recall that 

e Y1, Y2,... , Yn are independent, standard normal random variables on some probability space 

(Q, F, P). 


e S} = So exp [o Br + (u — Lo). 
e 
1,2 
So exp {o( Br + Yrkt) + (u — 30) (k + 1) 
= Sxexp [Ya + (u— L02)). 


Sk+1 


Therefore, 


Men PE exp {ois +(#-r- toy}, 


Sk - E F á 
El a F] = Me [exp {oYe+1} |Fe]exp{u -r — 30°} 
= TE- exp{to?}. exp{u erie 
eis 
— pur Yk 
= € E 


If u = r, the market measure is risk neutral. If y 4 r, we must seek further. 
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exp [Ya + (ur 40) 


Sk 

My 

S -r 

= TE exp {oY FeS lo?) 
TE- exp {ory — 1, 


where 
Mea = Yi + E, 


The quantity “— is denoted @ and is called the market price of risk. 


We want a probability measure IP under which Y,,...,Y;, are independent, standard normal ran- 
dom variables. Then we would have 
E S k+1 F a Sz E Y, F _1,2 
MaS e exp{0Yk+1 HF | .expí=30% 
= aE exp{Zo"}.exp{—407} 
— Le 
k 


Cameron-Martin-Girsanov’s Idea: Define the random variable 


Z = exp Se — a : 


j=l 


Properties of Z: 


e Z>0. 
Oo 
4 n 
EZ = E —OY,)}>. e 
exp 2 >) exp { a } 
= exp {26° exp {20° =1. 
Define 


P(A)= f zar VA EF. 
A 


Then P(A) > 0 for all A € F and 
IP(Q) =EZ=1. 


In other words, IP is a probability measure. 
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We show that JP is a risk-neutral measure. For this, it suffices to show that 
Y,=¥,+06,...,.Y%,=Y,+0 


are independent, standard normal under P. 


Verification: 
e Y1, Y2,... , Yn: Independent, standard normal under P, and 
IF exp E ujY; | = exp | a J ; 
j=l j=l 
e Y=Y +90, ..., Yp =Y, +0. 


e Z > 0 almost surely. 


e Z = exp [Dj (0Y; — 10%), 
Pa=f ZdP YAEF, 
A 


IEX = E(X Z) for every random variable X. 


e Compute the moment generating function of Y, sat, Yn) under IP: 


j=l 


E exp È uY; 


Il 
5 
oO 
> 
ge 
Eaa eo 
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12.8 Pricing a European Call 


Stock price at time n is 


Sn 


Soexp {oBn + (u 


So exp 


j=l 


oD Y +u- 30%)n 


— Lon) 


Payoff at time n is (S,, — K)”. Price at time zero is 


o ENT 
pe K) 


Mnp 


E e (s exp l 


E (So exp {ob + (r — to?)n} — K) ; 


00 


i 


00 


oð Y + (r —to 


j=l 


49] 


1 
e 2n2 db 
v2Tn 


since DE] Y; is normal with mean 0, variance n, under IP. 


This is the Black-Scholes price. It does not depend on u. 


Chapter 13 


Brownian Motion 


13.1 Symmetric Random Walk 


Toss a fair coin infinitely many times. Define 


Set 


13.2 The Law of Large Numbers 


We will use the method of moment generating functions to derive the Law of Large Numbers: 


Theorem 2.38 (Law of Large Numbers:) 


1 
Zan almost surely, as  k—=>00. 
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Proof: 


k 
u 
=JE —X; Def. of Mz. 
exp eS Pa (Def. o ke) 
u 
= II IF exp {ex i} (Independence of the X ;’s) 
j=l 


log (Lew + Lou») 


3 
a 
oe 
AS) 
ES 
E 
| 
3 


UL U ¿UL 


2 (L’H6pital’s Rule) 


. 2€ 
lim AAA 
0 so + ze 


[l 
= 


Therefore, 
lim yp(u) =e = 1, 


which is the m.g.f. for the constant 0. 


13.3 Central Limit Theorem 


We use the method of moment generating functions to prove the Central Limit Theorem. 


Theorem 3.39 (Central Limit Theorem) 


1 
—M;,> Standard normal, as k- 00. 


Vk 


Proof: 
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so that, 
l = lem ple d 
og pr(u) = k log (Lev +5 A) ; 


Let « = +. Then 


S 


lim log y(u) = lim 
ae a (u) s—0 x? 
Upur _ UL UL 
e 2 2 ITA LE 5 
= lim (L’ H6pital’s Rule) 
z—>0 De (Jerr + Lemus) 
2 
i 1 Zerr = gem 
= lim 
x20 Leus + seo z=—>0 2r 
U ¿UE Us Ue 
o set? te 
= lim 2 2 
z=—>0 2x 
we gua — u pur 
= lim Y 2 
z=—>0 2 
A 
=u, 
Therefore, 
i 1.2 
im Qj(u) = e2 
ao Cu) : 


which is the m.g.f. for a standard normal random variable. 


13.4 Brownian Motion as a Limit of Random Walks 


Let n be a positive integer. If > 0 is of the form E, then set 


If t > 0 is not of the form E, then define B® (t) by linear interpolation (See Fig. 13.1). 


Here are some properties of B (100) (1): 
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(L’H6pital’s Rule) 
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k/n (k+1)/n 


Figure 13.1: Linear Interpolation to define B® (ù). 


Properties of B9(1) : 


B090) (1) = a5 X; (Approximately normal) 
j=1 
1 100 
EB) (1) = 79 2 EX; = 0. 
j=l 
1 100 
var (BUD) (1)) = 20 var(X;) = 1 
j=l 
Properties of BW (2) : 
1 200 
B90) (2) =a X; (Approximately normal) 
j=1 


Also note that: 


Pi BC) (1) and BC) (2) = BL) are independent. 


e B(°°)(¢) is a continuous function of t. 


To get Brownian motion, let n>00 in B®) (t), t>0. 


13.5 Brownian Motion 


(Please refer to Oksendal, Chapter 2.) 
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B(t) = B(to) 


(Q, F,P) 
Figure 13.2: Continuous-time Brownian Motion. 


A random variable B(t) (see Fig. 13.2) is called a Brownian Motion if it satisfies the following 
properties: 

1. B(0)=0, 

2. B(t) is a continuous function of t; 

3. B has independent, normally distributed increments: If 


O=tp < ty <tg<...<t, 


and 
Y, = B(t1) — Blto), Yo = Blta)— Blti), ... Yn = Bltn) — Bltn-1), 
then 
e Y1,¥o,..., Y, are independent, 
e EY;=0 Vi, 
e var(Y;) = t; — t;i-1 Vj. 


13.6 Covariance of Brownian Motion 


Let 0 < s < t be given. Then B(s) and B(t) — B(s) are independent, so B(s) and B(t) = 


(B(t) — B(s)) + B(s) are jointly normal. Moreover, 
IEB(s) = 0, var(B(s) 
EB(t) = 0, var(B(t) 
EB(s)B(t) = EB(s)(BW) - B(s) 
= EB(s)(B(¢) — B(s)) + EB») 


0 s 


$, 
=l 


[l 
S 
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Thus for any s > 0, ¢ > 0 (not necessarily s < t), we have 


EB(sSB(t) = s At. 


13.7 Finite-Dimensional Distributions of Brownian Motion 


Let 
0<ti<t2<...<to 


be given. Then 
(B(t1), Blt2),..., B(tn)) 


is jointly normal with covariance matrix 


EB(t4) JE B(t,)B(t2) ... JEB(t)B(t,) 
CH IEB(tz)B(ty) = EB? (tz) ... FE B( tz) Btn) 
; a : pers OS E ES 
ti ti .. ti 
= ti ta ... ta 
a A onan E 


13.8 Filtration generated by a Brownian Motion 


{F(t) eo 

Required properties: 

e For each £, B(t) is F(t)-measurable, 

e For each t and fort < ty < tg <---< tn, the Brownian motion increments 

B(t1) - Bit), Blto)-Blti) ..., Bltn)- B(tn-1) 
are independent of F (t). 
Here is one way to construct F(t). First fix t. Let s € [0, t] and C € BUR) be given. Put the set 
{B(s) € Cf = {w: B(s,w) € C} 


in F(t). Do this for all possible numbers s € [0,t] and C € B(JR). Then put in every other set 
required by the o-algebra properties. 


This F(t) contains exactly the information learned by observing the Brownian motion upto time t. 
{ F(t) }e>o is called the filtration generated by the Brownian motion. 
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13.9 Martingale Property 


Theorem 9.40 Brownian motion is a martingale. 


Proof: Let 0 < s < t be given. Then 


= B(s) 
a 
Theorem 9.41 Let 0 € IR be given. Then 
Z(t) = exp {—-OB(t) — 10%) 
is a martingale. 
Proof: Let 0 < s < t be given. Then 
EZ] = E [exp{—-(B(0) = BOs) + B6) = EEUE- 5) +9} FOO) 
=E [ze exp{—0(B(t) — B(s)) — 48° (t — ro] 
= Z(s)IE \exp{—0(B(t) — B(s)) — 48° (t — )}] 
= Z(s) exp (0)? var(B(t) — B(s)) — 107 (ts) } 
= Z(s). 
a 


13.10 The Limit of a Binomial Model 


Consider the n*th Binomial model with the following parameters: 


e up =1+ F “Up” factor. (o > 0). 
ed, =1-— a: “Down” factor. 


er=0. 
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Let 44 (H ) denote the number of H in the first k tosses, and let 4, (T) denote the number of T in the 
first k tosses. Then 


te) + Ex (1) = k, 
Ej (41) — Ex (1) = Mp, 
which implies, 
te(H) = 3(k + Mr) 
te(T) = (6 — Mr) 


In the n*th model, take n steps per unit time. Set se = 1. Lett = E for some k, and let 
sm (1) = (1 + Z) ee) (1 z Z) atic 
vn vn 
Under JP, the price process $(") is a martingale. 
Theorem 10.42 As n>0x0, the distribution of S (7) (t) converges to the distribution of 
expíoB(t) — 40°t}, 
where B is a Brownian motion. Note that the correction —tort is necessary in order to have a 
martingale. 
Proof: Recall that from the Taylor series we have 
log(1 +2) = 2-427 + O(2°), 
so 


log S(t) = L(nt + Ma) log(1 4 


= nt (+ log(1 + =) + 5 log(1 — <)) 


oO 
+ Mnt (3 log(1 + a = opt 


n 
—<_ M 5— au 
>B, 70 


As n—00, the distribution of log S°) (t) approaches the distribution of o B(t) — 407t. a 
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B(t) = B(to) 


YA, 


E ¥ UN 


(Q, F, P*) 


Figure 13.3: Continuous-time Brownian Motion, starting at z # 0. 
13.11 Starting at Points Other Than 0 


(The remaining sections in this chapter were taught Dec 7.) 


For a Brownian motion B(t) that starts at O, we have: 


P(B(0) = 0) =1. 


For a Brownian motion B(t) that starts at x, denote the corresponding probability measure by IP” 
(See Fig. 13.3), and for such a Brownian motion we have: 


IP”(B(0)=z)= 1. 
Note that: 


e If x Æ 0, then IP” puts all its probability on a completely different set from P. 
e The distribution of B (t) under /P* is the same as the distribution of x + B (t) under P. 


13.12 Markov Property for Brownian Motion 


We prove that 


Theorem 12.43 Brownian motion has the Markov property. 


Proof: 
Lets > 0, t > 0 be given (See Fig. 13.4). 


E [nB +0)|F6)] =E |h(B(s+t)-B()+ Bis) ) Fs) 


Independent of F (s) F(s)-measurable 
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restart 


Figure 13.4: Markov Property of Brownian Motion. 


Use the Independence Lemma. Define 


g(z) = E[h( B(s +t) — B(s) + )] 


| (ax B(t) ) 
+ 
same distribution as B (s + t) — B (s) 
= EMB(). 


Then 


In fact Brownian motion has the strong Markov property. 


Example 13.1 (Strong Markov Property) See Fig. 13.5. Fix x > 0 and define 
T=min(t>0; B(t) =x). 


Then we have: 
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restart 


Figure 13.5: Strong Markov Property of Brownian Motion. 


13.13 Transition Density 


Let p(t, x, y) be the probability that the Brownian motion changes value from z to y in time t, and 
let 7 be defined as in the previous section. 


1 _ wea)? 
p(t, z, y) = rt 
g(v) = EBH) = | Myvtt,2,y) dy. 


13.14 First Passage Time 


Fix z > 0. Define 
T=min{t>0; B(t)=x). 


Fix 0 > 0. Then 
192 
exp {eB(t AT) — 30 (tA r)} 


is a martingale, and 


JE exp [OB (EA T) —40°(tA r)} = e 
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We have 
ap aS 
lim exp;—40?(tAT) > = aie (14.1) 
{ z j 0 ifr = ox, 
0 < exp{@B(t Ar) — $0?(tAT)} < e. 
Let {00 in (14.1), using the Bounded Convergence Theorem, to get 
E lexp {Oe — LL reo) =1. 
Let 010 to get IE1,,<..} = 1, so 
IP{r < œ}=1, 
IE exp{-40°r} = ae (14.2) 
Let a = 467. We have the m.g.f.: 
Ec" = eA a 3 0. (14.3) 
Differentiation of (14.3) w.r.t. œ yields 
—QaT T -rva 
—IE |Te = ——— 
[ee] om 
Letting «0, we obtain 
TET = 00. (14.4) 


Conclusion. Brownian motion reaches level x with probability 1. The expected time to reach level 
zx is infinite. 


We use the Reflection Principle below (see Fig. 13.6). 


IP{r <t, B(t)<z}=IP{B(t) >z} 
IP{r <t} = P{r <t, B(t)<z}+P{r <t, Bit) > x} 
= P{B(t) > 2 + P{B(t) > 2) 
= 2IP{B(t) > x} 


2 / a 
= e 2t 
V2nt a 
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shadow path 


Tis ate motion 


Figure 13.6: Reflection Principle in Brownian Motion. 


Using the substitution z = F dz = 4 we get 


Density: 


ð £ 22 
= —/P < = Qt 
fr (t) ðt {7 = t} Ea > 


which follows from the fact that if 


then 


Laplace transform formula: 
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Chapter 14 


The Ito Integral 


The following chapters deal with Stochastic Differential Equations in Finance. References: 


1. B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1995 


2. J. Hull, Options, Futures and other Derivative Securities, Prentice Hall, 1993. 


14.1 Brownian Motion 


(See Fig. 13.3.) (Q, F, P) is given, always in the background, even when not explicitly mentioned. 
Brownian motion, B(t, w) : [0, 00) x QR, has the following properties: 


1. B(0) = 0; Technically, P{w; B(0,w) = 0} =1, 
2. B(t) is a continuous function of t, 


3. If0 = to < ty <...<t,, then the increments 
B(t1) — Blto), ..., Btn) — Bltn-1) 


are independent,normal, and 


14.2 First Variation 


Quadratic variation is a measure of volatility. First we will consider first variation, FV (f), of a 
function f(t). 
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WO) 


Figure 14.1: Example function f (t). 


For the function pictured in Fig. 14.1, the first variation over the interval [0, T] is given by: 


FVo.r\(f) =[f(t1) — F(0)] — [f(t2) — FU) + FP) — f(t2)] 


Thus, first variation measures the total amount of up and down motion of the path. 


The general definition of first variation is as follows: 
Definition 14.1 (First Variation) Let II = {to,t,,... , tn } be a partition of [0, T], i.e 
O27 lt <L... SE 
The mesh of the partition is defined to be 
mas. a de): 
We then define 
FVion(f) = = lit, Y (ti) = F(t) 


Suppose f is differentiable. Then the Mean Value Theorem implies that in each subinterval [t;,, tk+1], 
there is a point t% such that 


F(tusi) — F(te) = SUE) tra — te). 
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Then = e 
Y If Ger) — FG) = DE IFE) ata — tx), 
k=0 k=0 


and 


FVom(f) = lim Sir E) lay — tk) 


14.3 Quadratic Variation 
Definition 14.2 (Quadratic Variation) The quadratic variation of a function f on an interval [0, T] 
is 


(ME) = im, YE Me = FIP, 
k=0 


Remark 14.1 (Quadratic Variation of Differentiable Functions) If f is differentiable, then ( f}(T) = 
0, because 


n—-1 
Seale => PE) thar — te)? 
k=0 


< |H]. 3 IEI tkt — te) 


and 
n—1 
T)< li Ol. di EN tky- t 
(FX PN | i, 2 1700! (tk+1 = tk) 
II t)|? dt 
= dim, jro (| 
=0. 
Theorem 3.44 
(BHD) =T, 


or more precisely, 


P{w € Q; (B(w) (T) =T}=1. 


In particular, the paths of Brownian motion are not differentiable. 
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Proof: (Outline) Let I] = (to,t1,... , tn } be a partition of [0, T]. To simplify notation, set Dg = 
B(t,41) — B(t,). Define the sample quadratic variation 


n—1 
Qu =) De 
k=0 
Then : 
(n= f=) [Pi = Gas). 
k=0 
We want to show that 
Qn —-T) =0. 
it of de 


Consider an individual summand 


Di, — (tes — th) = [B(tn41) — Bk)? — (tna — tr). 
This has expectation 0, so 


n-1 
E(Qu-T) = E Y [Df — (t41 — te)] = 0. 
k=0 
For j Æ k, the terms 
D} — (tj+ı — tj) and Dj — (tiga — te) 
are independent, so 


n—-1 
var(Qu - T) = i var[D? — (trai — ta)] 
k=0 
n—-1 
= Y ELD} - 2(tea1 — te) Dk + (tei — te)? 
k=0 


= - Ep (trpi — te)? — (thar — th)? + (thor — t1)7] 


E X is normal with mean 0 and variance o°, then IE(X*) = 30?) 


n—1 
=2 9 (tp — th)? 
k=0 


n—-1 

< 2/10011 Y tes — te) 
k=0 

= 211/17. 


Thus we have 


E(Qu-T)=0, 
var(Qu — T) < 2|Hj||.T. 
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As ||II||—0, var(Qn — T)—0, so 


li Spey 
it (Ln = T) 


Remark 14.2 (Differential Representation) We know that 
E[(Bltr41) — Bltr))? — (tha — tx)] = 0. 
We showed above that 
var[(Bltr41) — B(ée))? — (tha — th)] = Utr41 — tr)”. 
When (t,41 — tx) is small, (t,41 — tx)? is very small, and we have the approximate equation 
(B(tea1) — Blt)? = tea — tk, 
which we can write informally as 


dB(t) dB(t) = dt. 


14.4 Quadratic Variation as Absolute Volatility 


On any time interval [T;, T2], we can sample the Brownian motion at times 
Ty= to tE Sp =T 


and compute the squared sample absolute volatility 


1 n—-1 
Bltr+1) — Btk)? 
TT; È Elts) — Bets) 
This is approximately equal to 
1 oh- 


TT PT) - (BND)! 


As we increase the number of sample points, this approximation becomes exact. In other words, 
Brownian motion has absolute volatility 1. 


Furthermore, consider the equation 
T 
(B\(T) = paji dt, YT >0. 
0 


This says that quadratic variation for Brownian motion accumulates at rate 1 at all times along 
almost every path. 
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14.5 Construction of the Ito Integral 


The integrator is Brownian motion B(t),t > 0, with associated filtration F(t),t > 0, and the 
following properties: 


1. s < t= every set in F(s) is also in F(t), 
2. B(t) is F(t)-measurable, Yt, 


3. Fort<t¡<...< tn, the increments B(t,) — B(t), B(t2) — B(ti),..., Bltn) — B(tn-1) 
are independent of F(t). 


The integrand is ô(t), t > 0, where 


1. d(t) is F(t)-measurable Yt (i.e., ô is adapted) 


2. 6 is square-integrable: 
T 


E | #0 dixos, VT 
0 


We want to define the Itó Integral: 
t 
I(t) = f SdB);  t > 0. 
0 


Remark 14.3 (Integral w.r.t. a differentiable function) If f(t) is a differentiable function, then 
we can define 


j 5) f) = f EPC) du 


This won’t work when the integrator is Brownian motion, because the paths of Brownian motion 
are not differentiable. 


14.6 Itô integral of an elementary integrand 
Let II = {to, t1, .-. , tn } be a partition of [0, T], i.e., 
O= tolt... Ltr =T. 


Assume that ô(t) is constant on each subinterval [t,,t,41] (see Fig. 14.2). We call such a ô an 
elementary process. 


The functions B(t) and 6(t,) can be interpreted as follows: 


e Think of B(t) as the price per unit share of an asset at time t. 
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o(t)=5 t 

pecs 8( t )= 8(t3) 
SCE) = S(t) e0 
qo -O 


= t t t fis 
S(t ) =5(t)) 


Figure 14.2: An elementary function ô. 


e Think of to,t1,... ,tn as the trading dates for the asset. 


e Think of (t) as the number of shares of the asset acquired at trading date t; and held until 
trading date tg41. 


Then the Itó integral / (t) can be interpreted as the gain from trading at time t; this gain is given by: 


S(to)[B() — Blto) 1 0<t<4 
ie =B(0)=0 
d(to)[Blt) — Blto)] + (4) [B® — B(t1)), th <t<t, 
d(to)[B(t1) — Blto)] + 0(t1)[B(t2) — B(t1)] + ó(t2)[B(t) — Blta)], to St < ts. 
In general, ift < t < tay, 
k—1 
1(t) = 2 S(t) Bs) — B(t;)] + ote) [BO — B(t;)]. 


14.7 Properties of the Itó integral of an elementary process 


Adaptedness For each t, T(t) is F(t)-measurable. 


Linearity If 


then 
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tk ‘k+l 


Figure 14.3: Showing s and t in different partitions. 


and 


Martingale /(t) is a martingale. 


We prove the martingale property for the elementary process case. 


Theorem 7.45 (Martingale Property) 
T(t) = Y EIB Gja) — B) + 6) [BO — B), te StS tiya 


is a martingale. 


Proof: Let 0 < s < t be given. We treat the more difficult case that s and ¢ are in different 
subintervals, i.e., there are partition points tg and tz such that s € [te,te4,] and t € [tz, tp41] (See 
Fig. 14.3). 


Write 
e-1 
I(t) = > tB) — B(t;)] + 0(te)[B(te41) — Blte)] 
k-1 
+ `O 0(t5)[B(tj41) — B) + S(tx)[B(t) — B(tx)] 
¿=+1 


We compute conditional expectations: 


£-1 


= Y et) (B (t1) — Blt). 


= 6(t)[B(s) — B(te)] 
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These first two terms add up to /(s). We show that the third and fourth terms are zero. 


k-1 
E | Y MB) — BE) FC) 
¿=l+1 


E [BABO - BC) FO] = E |5(4) EBOI- Beta) FC) 


=0 
a 
Theorem 7.46 (Itó Isometry) 
t 
EP()=E f 5?(u) du. 
0 
Proof: To simplify notation, assume t = t, so 
k 
T(t) => StB) — B(t)] 
— 
3=0 D, 
Each D; has expectation 0, and different D; are independent. 
2 
P()= (3: 5090, 
j=0 
k 
=X 0 (t;)D? + 25° 4(t))5(t;) D:D; 
¿=0 i<j 
Since the cross terms have expectation zero, 
k 
EP (t) = X Elo’ (t;) D3] 
j=0 
k 
= PE [Pee |B) - Bere 
j=0 
k 
= >) ES (ti) (tii — t) 
j=0 
k ttl 
=E 5 8’ (u) du 
j=0 tj 
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van oF 5, path of $ 


Figure 14.4: Approximating a general process by an elementary process 64, over [0, T]. 


14.8 Itô integral of a general integrand 
Fix T > 0. Let ô be a process (not necessarily an elementary process) such that 


e 5(t) is F(t)-measurable, Vt € [0, T], 
e JE {i &(t) dt < œ. 
Theorem 8.47 There is a sequence of elementary processes Lón te] such that 
T 


jim JE : ln (t) — d(t)|? dt = 0. 


Proof: Fig. 14.4 shows the main idea. 


In the last section we have defined 


for every n. We now define 
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The only difficulty with this approach is that we need to make sure the above limit exists. Suppose 
n and m are large positive integers. Then 


T 


var(In(T) — Im(T)) = E ( / lón (t) — Sm (t)] anto) 


T 
(Itó Isometry:) = ref [ón (€) — ôm (t) dt 
0 


F 
=E f [ 16,.(t) — EOI IEE — Sm (0)1 17 dt 
T T 
((a +b)? < 2a? +20? :) < 21 f \5,(t) — 6(t)|? a+ 26 | lőn (t) — 6(t)|2 dt, 


which is small. This guarantees that the sequence {/,,(7') +52, has a limit. 


14.9 Properties of the (general) Itó integral 


Here ô is any adapted, square-integrable process. 


Adaptedness. For each t, T(t) is F(t)-measurable. 


Linearity. If 


then 


and 


Martingale. I(t) is a martingale. 
Continuity. /(t) is a continuous function of the upper limit of integration t. 
Itó Isometry. 1E7?(t) = IE få ô? (u) du. 


Example 14.1 () Consider the Itó integral 


j- B(u) dB(u). 


We approximate the integrand as shown in Fig. 14.5 
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Figure 14.5: Approximating the integrand B(u) with 64, over [0, T]. 


B(0) =0 if O<u<T/n; 


Sn (u) FE B(T/n) if T/n < u< OT Jn; 


B (CPI) it SM ur. 


By definition, 


ON 


To simplify notation, we denote 


so 
n—-1 


a B(u) dB(u) = lim PD (Bri — Br). 


=0 


We compute 


n-1 n-1 n—1 n—1 
A Ba)? = 45. Blas O Banti BE 
2 k=0 k=0 k=0 
n—1 n—1 n-1 
=3B2 +3) B- XO BB +30 Be 
j=0 k=0 k=0 
n—1 n—1 


n—1 
= $B? — Y By (Br4i — Bp). 
k=0 
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Therefore, 


bole 
w 
w 
+ 
= 
l 
Ss) 
E 
MS 


n—1 
XO Br(Bryr — Be) = 4B} — 
k=0 


or equivalently 


3 (#2) (EL 0] E (ELO) (A). 


Let n—>00 and use the definition of quadratic variation to get 


[ B(u) dB(u) = 4B° (T) — 4T. 


Remark 14.4 (Reason for the iT term) If f is differentiable with f (0) = 0, then 


Pros a 


T 


T 
Proa 
= 3f (u) 


In contrast, for Brownian motion, we have 


T 
f B(u)dB(u) = ŁB?(T) — }T. 


The extra term iT comes from the nonzero quadratic variation of Brownian motion. It has to be 
there, because 


T 
ref B(u) dB(u) = 0 (Itó integral is a martingale) 
0 


but 
ESB? (T) = 4T. 


14.10 Quadratic variation of an Itó integral 
Theorem 10.48 (Quadratic variation of It6 integral) Let 


I(t) = [ow dB(u). 


Then 
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This holds even if 6 is not an elementary process. The quadratic variation formula says that at each 
time u, the instantaneous absolute volatility of I is 5?(u). This is the absolute volatility of the 
Brownian motion scaled by the size of the position (i.e. 6(¢)) in the Brownian motion. Informally, 
we can write the quadratic variation formula in differential form as follows: 


dI(t) dI(t) = 8? (t) dt. 


Compare this with 
dB(t) dB(t) = dt. 


Proof: (For an elementary process 6). Let II = {to,t1,... , tn} be the partition for ô, i.e., d(t) = 
d(t,) for tg < t < t,41. To simplify notation, assume t = t,,. We have 
n—-1 
DO = 10) (tea) = Ota). 
k=0 
Let us compute (1) (tx41) — (I) (tk). Let E = (so, 51,... , Sm} be a partition 


tk = So $1 S ... L Sm = that. 


Then 


so 


It follows that 


n—-1 


(1) (t) = Y 8 (tr) (tha — te) 


k=0 


k=0 th 


t 
|| ||30 f 5?(u) du. 
——> 0 


Chapter 15 


Ito’s Formula 


15.1 It6’s formula for one Brownian motion 


We want a rule to “differentiate” expressions of the form f(B(t)), where f(x) is a differentiable 


function. If B(t) were also differentiable, then the ordinary chain rule would give 


d A / 
ql (BO) = FBO)BO, 


which could be written in differential notation as 


However, B(t) is not differentiable, and in particular has nonzero quadratic variation, so the correct 


formula has an extra term, namely, 


df(B(t)) = at, 


F'(B) dB) + 5 f"(BO) 
dB(t) dB(t) 


This is [t6’s formula in differential form. Integrating this, we obtain /t6’s formula in integral form: 


FBO) = SBW) = | FBW) 89 +E f FB) du. 
f(0) 


Remark 15.1 (Differential vs. Integral Forms) The mathematically meaningful form of Itô’s for- 


mula is Itô’s formula in integral form: 


FBO) = ABO) = [ FBW) 89 +4 f FBU) du. 
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This is because we have solid definitions for both integrals appearing on the right-hand side. The 
first, 


rw aw 


is an [tó integral, defined in the previous chapter. The second, 


[rw a 


is a Riemann integral, the type used in freshman calculus. 


For paper and pencil computations, the more convenient form of Itó”s rule is /t6’s formula in differ- 
ential form: 


df (B(t)) = F'(B) dBW) + sf" (Bi) de. 


There is an intuitive meaning but no solid definition for the terms df(B(t)), dB(t) and dt appearing 
in this formula. This formula becomes mathematically respectable only after we integrate it. 


15.2 Derivation of Itô’s formula 
Consider f(x) = $27, so that 
P(e) Sarre a) =L 


Let £k, £41 be numbers. Taylor’s formula implies 


Ha) fe) = (trpi — 2k) f (Ek) ena = oe) f” (Ek). 


In this case, Taylor’s formula to second order is exact because f is a quadratic function. 


In the general case, the above equation is only approximate, and the error is of the order of (% 441 — 
24)”. The total error will have limit zero in the last step of the following argument. 


Fix T > 0 and let II = [to,t1,... , tn } be a partition of [0, T]. Using Taylor’s formula, we write: 


= 5 [J (B(tk+1)) — f(B(te))] 


= Y [B(te+1) — B(te)] P(B(4)) + by [B(te+1) = Bl) f"(Bltx)) 
k=0 k=0 


n—-1 


= SS B(ts) [Bltr+1) — Blt4)]+ 3 XO [Bltey1) — B (ta). 
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We let ||II||—0 to obtain 


T 
BD) (BO) = f Bey dB) +} (BY) 


T i T 
=) f'(B(u)) aB(u) +4 f f"(B(u)) du. 


This is Itó”s formula in integral form for the special case 


Fa) = 42. 


15.3 Geometric Brownian motion 


Definition 15.1 (Geometric Brownian Motion) Geometric Brownian motion is 
S(t) = S(0) exp {oB(t) + (u — Lo?) 1) ; 


where y and o > 0 are constant. 


Define 
F(t,2) = S(0) exp {ow + (u = Lo?) 1) ; 
S(t) = f(t, B(0). 
Then 


A Sd e Jee Oe 
According to It6’s formula, 


dS(t) = df(t, B(t)) 
= fidt + frdB+ 4 frs dBdB 
dt 
= (u — 40°) fdt+of dB+L0?f dt 
= uS (t)dt + o S(t) dB(t) 


Thus, Geometric Brownian motion in differential form is 
dS (t) = wS(t)dt + o S(t) dB(t), 


and Geometric Brownian motion in integral form is 


S(t) = SO) + f ustu) aut f oS(u) dB(u). 
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15.4 Quadratic variation of geometric Brownian motion 
In the integral form of Geometric Brownian motion, 
i t 
S(t) = S (0) +f uS (u) aut f aS (u) dB(u), 
0 0 

the Riemann integral 

t 

F(t) = f iS (u) du 

0 

is differentiable with F*(t) = uS (t). This term has zero quadratic variation. The Itó integral 
t 
G(t) = f oS(u) dB(u) 
0 

is not differentiable. It has quadratic variation 

t 

(Gt) = f 025%(u) du. 

0 
Thus the quadratic variation of S is given by the quadratic variation of G. In differential notation, 
we write 

dS(t) dS(t) = (uS(t)dt + oS(t)dB(t))? = o? S7(t) dt 

15.5 Volatility of Geometric Brownian motion 


Fix 0 < Ti < To. Let II = {to,...,t,} be a partition of [7,7]. The squared absolute sample 
volatility of S on [T, Ta] is 


As Tə | Tı, the above approximation becomes exact. In other words, the instantaneous relative 
volatility of S is o°. This is usually called simply the volatility of S. 


15.6 First derivation of the Black-Scholes formula 


Wealth of an investor. An investor begins with nonrandom initial wealth Xy and at each time t, 
holds A(t) shares of stock. Stock is modelled by a geometric Brownian motion: 


dS(t) = uS (0) dt + 0oS(0dB(t). 
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A(t) can be random, but must be adapted. The investor finances his investing by borrowing or 
lending at interest rate r. 


Let X (t) denote the wealth of the investor at time t. Then 
dX(t) = A(t)dS(t) + r |X (t) — A(t). S(£)] dt 
= A(t) [uS (t)dt + oS(t)dB(t)] +r (X(t) — A(0)5(0)] dt 
=rX(t)dt + A(t)S(t) (u—r) dt + A(t)S(t)odB(t). 
x 
Risk premium 
Value of an option. Consider an European option which pays g (S (T)) at time T. Let v(t, x) denote 
the value of this option at time ¢ if the stock price is S(t) = x. In other words, the value of the 
option at each time t € [0, T] is 
v(t, S(t)). 
The differential of this value is 
do(t, S(t)) = vidt + vedS + EvrsdS dS 
= vidt + 07 [uS dt +05 dB] + Lusso” S? dt 
= [v + Sv, + 4075700] dt + o0Sv,dB 


A hedging portfolio starts with some initial wealth Xo and invests so that the wealth X (t) at each 
time tracks u(t, S(t)). We saw above that 


dX (t) = [rX + A(up-—r)S] dt + 0SAdB. 


To ensure that X (t) = v(t, S(t)) for all t, we equate coefficients in their differentials. Equating the 
dB coefficients, we obtain the A-hedging rule: 


A(t) = v(t, S(t). 
Equating the dt coefficients, we obtain: 
ve + USO, + Lo? S Un =rX+A(u-r)S. 
But we have set A = vz, and we are seeking to cause X to agree with v. Making these substitutions, 
we obtain 
ve + Sve + 205050 = rv + vsu- r)S, 
(where v = v(t, S(t)) and S = S(t)) which simplifies to 
v + rSv, + ao Oe: = rv. 
In conclusion, we should let v be the solution to the Black-Scholes partial differential equation 
vilt, £) + ravylt,2) + $0727 Velt, 2) = rv(t, e) 
satisfying the terminal condition 
v(T, x) = g(x). 
If an investor starts with Xy = v(0,5(0)) and uses the hedge A(t) = v,(t, S(t)), then he will have 
X(t) = v(t, S(t)) for all t, and in particular, X (T) = g(S(T)). 
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15.7 Mean and variance of the Cox-Ingersoll-Ross process 


The Cox-Ingersoll-Ross model for interest rates is 


dr(t) = a(b — er(t))dt + oy/r(t) dB(t), 


where a, b, c, a and r(0) are positive constants. In integral form, this equation is 


(0 = r(0) + NE ~ er(u)) du of Vr) dB(u). 


We apply It6’s formula to compute dr?(t). This is df (r(t)), where f(x) = 27. We obtain 
dr*(t) = df(r(t)) 
= f'(r()) dr(e) + 3f” (t)) arW ar) 
= 2r(t) [a(b— cr(t)) dt + oy/r(t) aB(o| + lato — cr(t)) dt + 0x/r(t) aB(0| 


= 2abr(t) dt — 2acr?(t) dt + 20r2 (t) dB(t) + o?r (t) dt 
= (2ab + o”)r(t) dt — 2acr2(t) dt + 2or?(t) dB(t) 


The mean of r (+). The integral form of the CIR equation is 


r(t) =r(0) + afo — er(u)) du+ of y/r(u) aB(a). 


Taking expectations and remembering that the expectation of an Itó integral is zero, we obtain 
t 
Er(t) = r(0) + af (b — cEr(u)) du. 
0 


Differentiation yields 


d 
Gert) = a(b — cEr(t)) = ab — aciEr(t), 


which implies that 


d act = act d _ pact 
a le Er(t)] =e [acierto + T Er (i) = "gb, 


Integration yields 


t b 
e*Er(t) — r(0) = ab | eee du = —(e** — 1). 
0 € 


We solve for Ær (t): 


JEr(t) = ° pen (o = 2) : 


c 

If r(0) = 2, then Ær (t) = ? for every t. If r(0) Æ 4, then r(t) exhibits mean reversion: 
b 
lim Er(t) = -. 
too C 
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Variance of r(t). The integral form of the equation derived earlier for dr?(t) is 


r(u) du — 2ae | r2(u) du + O dB(u). 


t 


P= 0) +04 >) f 


Taking expectations, we obtain 
t t 
IEr?(t) = r° (0) + (2ab + o) f Er(u) du — 2ac f Er? (u) du. 
0 0 


Differentiation yields 
d 
Ger) = (2ab + 0) Fr(t) — 2acEr?(t), 
which implies that 
d d 
qe E(t) Syne f2acrr?(o) + TE) 
= e™* (2ab + o° Er (t). 


Using the formula already derived for ’r(t) and integrating the last equation, after considerable 
algebra we obtain 


15.8 Multidimensional Brownian Motion 


Definition 15.2 (d-dimensional Brownian Motion) A d-dimensional Brownian Motion is a pro- 
cess 


B(t) (BiG ie , Balt)) 
with the following properties: 
e Each By(t) is a one-dimensional Brownian motion; 


e Ifi Æ j, then the processes B;(t) and B; (t) are independent. 
Associated with a d-dimensional Brownian motion, we have a filtration {F (t) } such that 


e For each £, the random vector B (t) is F(t)-measurable; 
e Foreacht < tı <... < tn, the vector increments 
B(t1) — B(t),..., B(tn) — Bltn-1) 
are independent of F(t). 


174 
15.9 Cross-variations of Brownian motions 


Because each component B; is a one-dimensional Brownian motion, we have the informal equation 
dB; (t) dB; (t) = dt. 
However, we have: 


Theorem 9.49 [fi Æ j, 
dB; (t) dB; (t) =0 


Proof: Let Il = {to,... ,tn} be a partition of [0,7]. For i 4 j, define the sample cross variation 
of B; and B; on [0, T] to be 


n—1 


Ch = Dd [Bi (teri) — Bi (te)] [By (e421) — By (te) - 
k=0 


The increments appearing on the right-hand side of the above equation are all independent of one 
another and all have mean zero. Therefore, 


IECY = 0. 
We compute var (Cr). First note that 
n-1 2 2 
Crea Bitra) = Bt) |2; (tk+1) — Bj (te) 
k=0 


+2 Y [Bi(te41) — Bi (te)] [B; (ter) — Bilte))[Biltr+1) — Biltx)] [Bj (tear) — By (te)] 
<k 


All the increments appearing in the sum of cross terms are independent of one another and have 
mean zero. Therefore, 


var(Cn) = ECÃ 


= EY [Bi (tk41) — Bit) [Bj (tea) — By (te)]’. 
k=0 


But [B;(tz41) — Bi(t,)]? and [B;(tx+1) — B; (tp) are independent of one another, and each has 
expectation (¢,41 — tx). It follows that 


n—-1 n—1 
var(Crr) = Y (thea — te)” < [NT] $ (deta — te) = [IMT 
k=0 k=0 


As ||II|| 0, we have var(C11)>0, so Cry converges to the constant ECT = 0. a 
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15.10  Multi-dimensional Ito formula 


To keep the notation as simple as possible, we write the Itó formula for two processes driven by a 
two-dimensional Brownian motion. The formula generalizes to any number of processes driven by 
a Brownian motion of any number (not necessarily the same number) of dimensions. 


Let X and Y be processes of the form 
X(t) = X(0)+ 1 o E BG) Bi Ga if elo. 
Y()=Y(0) + i B(u) du+ [ EHO ee I 622(u) dBo(u). 


Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus one or 
more Itô integrals, are called semimartingales. The integrands a(u), p (u), and 6;;(w) can be any 
adapted processes. The adaptedness of the integrands guarantees that X and Y are also adapted. In 
differential notation, we write 


dX =adt+ 011 dB; i Fi 012 dBa, 
dY = b dt + 691 dB, T 699 dB. 


Given these two semimartingales X and Y , the quadratic and cross variations are: 


dX dX = (a dt + O14 dB; + 012 dB)’, 
cae a dB, dB, +2611612 dB, dB) +63 dBa dBa 


dt 0 dt 


= (i dr Sio)” dt, 
dY dY = (8 dt + 031 dB, + $22 dB2)° 
= (53, + 072)” dt, 
dX dY = (a dt + O44 dB; T 012 dB2)(P dt + 091 dB; + 6992 dB) 


= (911021 + 012022) dt 


Let f(t, x, y) be a function of three variables, and let X (t) and Y (t) be semimartingales. Then we 
have the corresponding Itó formula: 


df(t,z,y) = fidt + frdX + f dY + > [few dX dX + 2fgy dX dY + fy, dY dY]. 
In integral form, with X and Y as decribed earlier and with all the variables filled in, this equation 
is 
SEX), YO) — FO, X (0), Y (0)) 
t 
= / [fe + afe + Bfy 4 $ (57; a + (911021 + 912022) fey + $ (83, + O55) fuy] du 


t t 
di / life aif Bs | [Safe + 822 f,) dBa, 


where f = f(u, X (u), Y (u), fori, j € (1,2), 0, = ĉi; (u), and B; = B;(u). 
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Chapter 16 


Markov processes and the Kolmogorov 
equations 


16.1 Stochastic Differential Equations 


Consider the stochastic differential equation: 
dX (t) = a(t, X(t)) dt + a(t, X (t)) dB(t). (SDE) 


Here a(t, x) and o(t, x) are given functions, usually assumed to be continuous in (t, x) and Lips- 
chitz continuous in x,1.e., there is a constant £ such that 


Ja(t, 2) — a(t,y)| < Llx — yl, Jo(t, 2) — o(t,y)| < Ele — y] 


for all £, x, y. 


Let (to, x) be given. A solution to (SDE) with the initial condition (to, x) is a process {X (t) }>to 
satisfying 


X (to) = £, 
Xx) =X (t0) + f als, X(s)) ds+ | ols, X(8)) dB(s), t > to 


The solution process {X (t) ) ¿>1, will be adapted to the filtration {F (t) )¿>0 generated by the Brow- 
nian motion. If you know the path of the Brownian motion up to time ¢, then you can evaluate 
X(t). 
Example 16.1 (Drifted Brownian motion) Let a be a constant and o = 1, so 

dX (t) = a dt + dB(t). 
If (to, £) is given and we start with the initial condition 


X (to) = £, 
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then 
X(t) = «+ a(t — to) + (B(t) — Blto)), t > to. 


To compute the differential w.r.t. t, treat to and B (to) as constants: 


dX (t) =a dt + dB(t). 


E 
Example 16.2 (Geometric Brownian motion) Let r and y be constants. Consider 
dX(t) =rX(t) dt+oaX(t) dB(t). 
Given the initial condition 
X (to) = £, 
the solution is 
X(t) = zexp {o(B(t) — Blto)) + (r — 207) (t — to)} ; 
Again, to compute the differential w.r.t. t, treat to and B (to) as constants: 
dX(t) = (r — 40°) X (t) dt +o X(t) dB(t) + 40° X(t) dt 
=rX(t)dt+0oX(t) dB(t). 
E 


16.2 Markov Property 


Let 0 < to < tı be given and let h(y) be a function. Denote by 
IE" h(X (t1)) 


the expectation of A(X (t1)), given that X (to) = z. Now let £ € IR be given, and start with initial 
condition 


X(0) =. 


We have the Markov property 
IES x(a) Fo) = IE*™X (t0) p(X (t,)). 


In other words, if you observe the path of the driving Brownian motion from time 0 to time to, and 
based on this information, you want to estimate h(X (t,)), the only relevant information is the value 
of X (to). You imagine starting the (SDE) at time to at value X (tp), and compute the expected 
value of h(X (t1)). 
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16.3 Transition density 


Denote by 
plto, t1; x,y) 


the density (in the y variable) of X (t1), conditioned on X (to) = x. In other words, 
JE h(X (t1)) = A h(y)p(to, ti; 2,4) dy. 
The Markov property says that for 0 < to < tı and for every £, 
es x(a) Fto) = f holoti X (to), y) dy. 


Example 16.3 (Drifted Brownian motion) Consider the SDE 
dX (t) = a dt + dB(t). 


Conditioned on X (to) = x, the random variable X (t1) is normal with mean x + a(t, — to) and variance 


(ty = to), 1.e., 
1 — ti —to)))? 
Hints: 2.9) = ob ap EN 
Qn (ty — to) 2(t1 — to) 
Note that p depends on ty and tı only through their difference tı — to. This is always the case when a(t, £) 
and o (t, z) don’t depend on t. E 


Example 16.4 (Geometric Brownian motion) Recall that the solution to the SDE 
dX(t) =rX(t) dt+0X(t) dB(t), 
with initial condition X (tọ) = z, is Geometric Brownian motion: 
X (ti) = z exp {o(B(t1) — Blto)) + (r — 207) (ty = to)) ] 
The random variable B(t,) — B (to) has density 
PUG) =< se oie 7) db, 
2m(t1 — to) 2(t1 — to) 
and we are making the change of variable 
y = xexp {ob + (r — 307) (t1 — to)) 


or equivalently, 


The derivative is 
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Therefore, 
plto, t1; £, y) dy = P{X(ti) € dy} 
: 1 y 12 y) 
= XP + — llog Í — (r — 50) (ti —t du. 
oyy/2n(tı — to) Al Or o a 


Using the transition density and a fair amount of calculus, one can compute the expected payoff from a 
European call: 


E (X(T) - K)* = f o- K)*p(t, T; x, y) dy 
=D aN (= [log = +r(T—t) + 50 (T — »)) 
-KN (= [log > +r(T-t)- to’ (T -— D 


where 


Therefore, 


EE |e" T-9(X(T) — K)t 


és 1 X(t) 12 
= X(N (- mS h s +r(T —t) + 530 (T —t) 
E SN l Al) +r(T-t)-40°(T-—t) 
oyT -t K 
E 
16.4 The Kolmogorov Backward Equation 
Consider 
dX (t) = a(t, X (t)) dt + a(t, X(t)) dB(t), 
and let p(to, t1; x, y) be the transition density. Then the Kolmogorov Backward Equation is: 
o 0 0? 
EF (to, t1; 2, y) = alto, 2) plo, t1; t, y) + 27 (to, 2) 5 ¿pllo, ti; £, y). 
R (KBE) 


The variables ty and x in (K BE) are called the backward variables. 


In the case that a and o are functions of x alone, p(to, t1; x, y) depends on to and tı only through 
their difference 7 = tı — to. We then write p(T; x,y) rather than p(to,t1; x,y), and (KBE) 
becomes 


bole 


Y : ER Y 7 2 a > 
gr z, y) E ale) vr; z, y) + a (2) P(t z, y). (KBE”) 
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Example 16.5 (Drifted Brownian motion) 


dX(t) = a dt + dB(t) 


1 
p(T; £,y) = Sore 


ð ð 1 (y — z — ar)? 
—p = p, = | — ex AS 
ar? el Or /2nr 5 27 


27 27? 
ð toe ar 
anh T Pe T 
K O 0 y-=-ar pre ear 
Qe ae Ox T T 7 
2 
—g— ar 
Liga = ee 
Therefore, 
1 a a(y—u=—ar) 1 (y — z — ar)? 
ap, + ZPre = | z Ir + 972 
= Pr. 
This is the Kolmogorov backward equation. E 


Example 16.6 (Geometric Brownian motion) 


dX(t) = rX (t) dt + o X(t) dB(t). 


(m e) = — [log 2 e-i) 
iO 3707 08 = r=30%)T . 


It is true but very tedious to verify that p satisfies the KBE 


Te Bie? 
Pr = PEP, + 50°" Pee. 


16.5 Connection between stochastic calculus and KBE 


Consider 
dX (t) = a(X(t)) dt+o(X(t)) dB(t). (5.1) 
Let h(y) be a function, and define 


v(t, ©) = E*h(X(T)), 
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where 0 < t < T. Then 
v(t, 1) = few p(T —t; x,y) dy, 
v(t, x) = - $ 19) p(T —t; 2,y) dy, 
ve(t, x) = few po(T —t; x,y) dy, 
Veli £) = few Peo(T — t; x,y) dy. 
Therefore, the Kolmogorov backward equation implies 


u(t, 2) + a(x)uy(t, 2) + to?’ (2) Une(t, y= 


fro [-p-(T - t; £, y) + a(u)po(T — t; £, y) + 30 (0)poo(T — t; 2, y)| dy = 0 


Let (0, €) be an initial condition for the SDE (5.1). We simplify notation by writing JE rather than 
JTE, 


Theorem 5.50 Starting at X (0) = €, the process v(t, X (t)) satisfies the martingale property: 
E [ot x) Fo) = v(s, X(s)), O0<s<t<T. 
Proof: According to the Markov property, 
E [ax (ry) FO] = XOX) = vt, XW), 


so 


It6’s formula implies 


dv(t, X (t)) = vedt + ved X + 4vrsdX dX 
= vidt + av,dt + ov,dB + 20U gdl. 
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In integral form, we have 
v(t, X()) = v(0, X (0)) 


ce [ [ve(u, X (u)) +a(X (u))oz(u, X (u)) + Lo (X (u))vss(u, X (u)) | u 


i [atea Xt) dB(u). 


We know that v(t, X (t)) is a martingale, so the integral {> [o + avy + $07 vp du must be zero 
for all t. This implies that the integrand is zero; hence 


1,2 
Ue + Us + 30 Vex = 0. 


Thus by two different arguments, one based on the Kolmogorov backward equation, and the other 
based on It6’s formula, we have come to the same conclusion. 


Theorem 5.51 (Feynman-Kac) Define 


v(t,2) = ERX (T),  0<t<T, 


where 
dX (t) = a(X(t)) dt + 0(X (t)) dB(t) 
Then 
vilt, £) + a(z)vz(t, 2) + L0*(2) op (t, £) =0 (FK) 
and 
v(T, x)= h(x). 


The Black-Scholes equation is a special case of this theorem, as we show in the next section. 


Remark 16.1 (Derivation of KBE) We plunked down the Kolmogorov backward equation with- 
out any justification. In fact, one can use It6’s formula to prove the Feynman-Kac Theorem, and use 
the Feynman-Kac Theorem to derive the Kolmogorov backward equation. 


16.6 Black-Scholes 
Consider the SDE 
dS(t) = rS(t) dt + 0S(t) dB(t). 


With initial condition 
SiH—we, 


the solution is 


S(u) = zexp {o(B(u) — BH) + (r—4o*)(u-h, uzt 
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Define 
v(t, x) = IE“*h(S(T)) 
= Eh (z exp {o(B(T) =B(0) 4 (r age (r= D} : 


where h is a function to be specified later. 


Recall the Independence Lemma: If G is a o-field, X is G-measurable, and Y is independent of G, 
then 


where 


ale) = Eh(<, Y). 


With geometric Brownian motion, for 0 < t < T, we have 


= S(t) exp{o(B(T) - Bi) + (r- 30?)(T -1)} 


_- 
F(t)-measurable independent of F (t) 
We thus have 
S(T) = XY, 

where 

X= S(t) 

Y = exp {o(B(T) - B(0) + (r - 40)(T -t)}. 
Now 


Eh(£Y) = v(t, x). 
The independence lemma implies 


E [MSTI reo] = EMAVIFO 


=v(t, X) 
= v(t, S(t)). 
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We have shown that 
v(t,S(0) = E sry teo] o 0<t<T. 


Note that the random variable A(S (T)) whose conditional expectation is being computed does not 
depend on t. Because of this, the tower property implies that v(t, S(t)), 0 < t < T, is a martingale: 
For0<s<t<T, 


E besolo] =E E stop to] | 
-E hmo] 
= v(s, S(s)). 


This is a special case of Theorem 5.51. 


Because v(t, S(t)) is a martingale, the sum of the dt terms in dv(t, S(t)) must be 0. By Itô’s 
formula, 


dv(t, S(t) = [vi(t, S(0) dt + rS (valt, $() + 10° Ovr (t, S()| dt 
+oS(t)v,(t, S(t)) dB(t). 
This leads us to the equation 
vilt, £) + rave(t, £) + 40°27 vee (t, 2) = 0, 0<t<P, 220. 
This is a special case of Theorem 5.51 (Feynman-Kac). 
Along with the above partial differential equation, we have the terminal condition 
v(T, x)= h(x), «> 0. 


Furthermore, if S(t) = 0 for some t € [0,7], then also S(T) = 0. This gives us the boundary 
condition 
v(t, 0) = h(0), O<t<T. 


Finally, we shall eventually see that the value at time t of a contingent claim paying h(S(T)) is 
u(t, £) =e "CD E h(S(T)) 


=e "TD y(t, x) 


at time £ if S(t) = x. Therefore, 


v(t, x) = e T- u(t, 2), 
vilt, 2) = -re C-D u(t, a) + Cda (t, 2), 
va (t, 2) = Cda, (t, 2), 
Vee (t, x) = e Ud aaa (t, 2) 
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Plugging these formulas into the partial differential equation for v and cancelling the er T-t) ap- 
pearing in every term, we obtain the Black-Scholes partial differential equation: 
—ru(t, £) + u(t, £) + reuc(t, ©) + $0727 Uy (t, x) = 0, 052 <7, 22 0: 
(BS) 
Compare this with the earlier derivation of the Black-Scholes PDE in Section 15.6. 
In terms of the transition density 
(7; 2,9) = 2 lola] 
T; £, y) = — exp 4 == Jlog — — (r — 50 — 
dl á oyy2r(T — t) p 2(T — t)a? By a 
for geometric Brownian motion (See Example 16.4), we have the “stochastic representation” 
u(t, £) =e "O Et h(S(T)) (SR) 


See) T h(y)p(t, T; z, y) dy. 
0 


In the case of a call, 
h(y) = (y - K)* 

and 

ick y ( l l A E E D|) 

Vets ——— |log—4 r(T - do (T — 
oyT i| °K E 

1 x 
—r(T-t) x 1,2 
Even if h(y) is some other function (e.g., h(y) = (K — y)*, a put), u(t, x) is still given by and 
satisfies the Black-Scholes PDE (BS) derived above. 


16.7 Black-Scholes with price-dependent volatility 


dS(t) = rS(t) dt + B(S) dB(0), 
v(t, £) = UD et" (S(T) — K)". 


The Feynman-Kac Theorem now implies that 
—rv(t,2) + w(t, £) + rev, (t,2) + 18 (2)U20(t, £) = 0, OS¢< E 25-0, 
v also satisfies the terminal condition 


v(T, x)= (2 K)y?, z>0, 
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and the boundary condition 
v(t, 0) = 0, 0O<t<T. 


An example of such a process is the following from J.C. Cox, Notes on options pricing I: Constant 
elasticity of variance diffusions, Working Paper, Stanford University, 1975: 


dS(t) =rS(t) dt + oS? (©) dB(t), 
where 0 < 6 < 1. The “volatility” oS SSL] decreases with increasing stock price. The corre- 
sponding Black-Scholes equation is 


ru+tu,trev; 4 Lo? = 0, O<t<T «>0; 


v(t,0)=0, O<t<T 
v(T, x)= (x — K)*, z>0. 
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Chapter 17 


Girsanov’s theorem and the risk-neutral 
measure 


(Please see Oksendal, 4th ed., pp 145-151.) 


Theorem 0.52 (Girsanov, One-dimensional) Let B(t),0 < t < T, be a Brownian motion on 
a probability space (Q,F,P). Let F(t),0 < t < T, be the accompanying filtration, and let 
0(t),0 < t < T, be a process adapted to this filtration. For 0 < t < T, define 


Ba) = i 6(u) du + B(t), 
NS {- [aw dB(u) — ah Pu) du} 
and define a new probability measure by 
P(A) = J Z(T) dP, YAEF. 
Under IP, the process B(t),0 < t < T, is a Brownian motion. 


Caveat: This theorem requires a technical condition on the size of 8. If 
T 
Eexpx 4 f 0 (u) dur < œ, 
0 


We make the following remarks: 


everything is OK. 


Z(t) is a matingale. In fact, 
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IP is a probability measure. Since 7(0) = 1, we have JE Z(t) = 1 for every t > 0. In particular 


P(Q) = f 20) dP = EZ(T)=1, 


so IP isa probability measure. 

JE in terms of Æ. Let JE denote expectation under IP. If X is a random variable, then 
EZ = E[Z(T)X]. 
To see this, consider first the case X = 1 4, where A € F. We have 
EX = P(A) = [ 2a) diP = zm, dIP =E[Z(T)X]. 

Now use Williams” “standard machine”. 

P and IP. The intuition behind the formula 
P(A) = f Z(T) dP VAEF 
is that we want to have E 
P(w) = Z(T, w) P(w), 


but since P(w) = 0 and P(w) = 0, this doesn’t really tell us anything useful about /P. Thus, 
we consider subsets of 2, rather than individual elements of Q. 


Distribution of B(T). If 9 is constant, then 
Z(T) = exp {-0B(T) - 10T} 
B(T) = 6T + B(T). 


Under IP, B(T) is normal with mean 0 and variance T, so B(T) is normal with mean 6T and 
variance T: 


P(B(T) € db) = seeps | db. 


Removal of Drift from B(T). The change of measure from JP to IP removes the drift from B(T). 
To see this, we compute 


EB(T) = B[Z(T)(@T + B(T))] 
= E [exp {-0B(T) - 46T} (eT + B(T))| 
2 =F (er + b) exp{—6b — 1027) 0x0 {ar db 
= =F de T +) op E] db 
(y=0T+b)= = - y exp (E) dy (Substitute y = OT + b) 
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We can also see that JE B (T) = 0 by arguing directly from the density formula 


IP { B(t) € db} = : eof- db. 


V2rT 2T 
Because 
Z(T) = exp{-0B(T) — 40°T} 
= exp{—0(B(T) — 6T) - 4T} 
= exp{—0B(T) + 40°T}, 
we have 


IP B(T) € do) = P{B(T) € db) exp {-0b + 1077) 


1 C200) > aie a 
= exp LL _ 9h 4 82 Tb d. 
V2rT l 27 2 
1 


Under IP, B(T) is normal with mean zero and variance T. Under IP, B(T) is normal with 
mean 0T and variance T. 


Means change, variances don’t. When we use the Girsanov Theorem to change the probability 
measure, means change but variances do not. Martingales may be destroyed or created. 
Volatilities, quadratic variations and cross variations are unaffected. Check: 


dB dB = (0(t) dt + dB(t))? = dB.dB = dt. 
17.1 Conditional expectations under /P 


Lemma 1.53 Let 0 < t < T. If X is F(t)-measurable, then 


EX = E[X.Z(t)]. 


Proof: 


because Z(t), 0 < t < T, is a martingale under IP. a 
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Lemma 1.54 (Baye’s Rule) If X is F(t)-measurable and 0 < s < t < T, then 


1 


E[X|F(s)] = ZO 


E[XZ(t)|F(s)]. (1.1) 


Proof: It is clear that 7 E [X Z(t)|F(s)] is F(s)-measurable. We check the partial averaging 
property. For A € F(s), we have 

1 
Ze 
= E (LAE |X Z(t)|F(s)]] (Lemma 1.53) 
= ELE[lsX Z()|F(s)]] (Taking in what is known) 
= E[lsXZ(t)] 


= El14X] (Lemma 1.53 again) 


= | x dP. 
A 


| RPO OP = E [1,7 EIX ZAIF) 
A Z(s) 


Although we have proved Lemmas 1.53 and 1.54, we have not proved Girsanov’s Theorem. We 
will not prove it completely, but here is the beginning of the proof. 


Lemma 1.55 Using the notation of Girsanov’s Theorem, we have the martingale property 


E[B()|F(s)])= B(s), O<s<t<T. 


Proof: We first check that B(t) Z(t) is a martingale under /P. Recall 
dB(t) = 0(t) dt + dB(t), 
dZ(t) = —O(t)Z(t) dB(t). 
Therefore, 
d(BZ) = B dZ + Z dB + dB dZ 
= —B0Z dB + Z9 dt + Z dB — 0Z dt 
= (-B0Z + Z) dB. 


Next we use Bayes’ Rule. For0 <s <t <T, 


E(B (t)|F(s)] = zg BOZO IF 
1 A: 


B(s). 
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Definition 17.1 (Equivalent measures) Two measures on the same probability space which have 
the same measure-zero sets are said to be equivalent. 


The probability measures IP and IP of the Girsanov Theorem are equivalent. Recall that P is 
defined by 


P(A) = fan dP, AEF. 


If P(A) = 0, then f4 Z(T) d/P = 0. Because Z(T) > 0 for every w, we can invert the definition 
of JP to obtain 


Pa) = | aay Ë EF: 


If P(A) = 0, then fy zi dP = 0. 


17.2 Risk-neutral measure 


As usual we are given the Brownian motion: B(t),0 < t < T, with filtration F(t),0 <t <T, 
defined on a probability space (Q, F, P). We can then define the following. 


Stock price: 
dS(t) = u(t) S(t) dt + o(t)S(t) dB(t). 


The processes u(t) and o(t) are adapted to the filtration. The stock price model is completely 
general, subject only to the condition that the paths of the process are continuous. 


Interest rate: r(t),0 < t < T. The process r(t) is adapted. 


Wealth of an agent, starting with X(0) = x. We can write the wealth process differential in 
several ways: 


dX(t)= Aé +r(01X0-A(0S(0] de 
a, ra a __ŘŮ 
Capital gains from Stock Interest earnings 


= r(t)X(t) dt + A(®[dS(t) — rS (t) dt] 
= r(t)X(t) dt + A(t) (u(t) — r(t)) S(t) dt + Atat) S(t) dB(t) 


Risk premium 


= r(t)X (t) dt + A(t)o(t) S(t) mae dt + dB(t) 


Market price of risk=0 (+) 
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Discounted processes: 
d G fort “5()) = Sr) pays) de + dS(0)] 


d (e for) “x()) = So" [p(X (e) dt + dX) 


Notation: 
= Sort) du 1 sgr fire du 
a 0 | 
2 et ee r(t) 


B(t) Pe 
= ggg HO -r OSO de + SH aBO) 
= zgr 050 [o(t) dt + aB(0), 
d Ga = A(t) d (o) 
= an a(t) S(t) [9(1) de + dB) 


Then 
sOy 1 p 
(o) = zgr 0S0 B0, 
xO _ AW), 
(Say) = Fy 790 BO 


Under IP, => and ZR are martingales. 


Definition 17.2 (Risk-neutral measure) A risk-neutral measure (sometimes called a martingale 
measure) is any probability measure, equivalent to the market measure JP, which makes all dis- 
counted asset prices martingales. 
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For the market model considered here, 
P(A) = J Z(T) dP, AEF, 
A 


where 


Z(t) = exp- [aw E sf ew du}, 


is the unique risk-neutral measure. Note that because 8 (t) = LOO, we must assume that a(t) 4 
0. 

Risk-neutral valuation. Consider a contingent claim paying an F(T')-measurable random variable 
V at time T. 


Example 17.1 


V=(S(T)—K)*, European call 
V = (K - S(T))*, European put 


Ma ü 
V= (+/ S(u) du — x) f Asian call 
T Jo 


V = max S(t), Look back 
0<t<T 


If there is a hedging portfolio, i.e., a process A(t), 0 < t < T, whose corresponding wealth process 
satisfies X (T) = V, then 


This is because a4 is a martingale under IP, so 


B 
xo Elm = E Lal 
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Chapter 18 


Martingale Representation Theorem 


18.1 Martingale Representation Theorem 


See Oksendal, 4th ed., Theorem 4.11, p.50. 


Theorem 1.56 Let B(t),0 < t < T, be a Brownian motion on (Q, F, P). Let F(t),0 < t <T, be 
the filtration generated by this Brownian motion. Let X (t), 0 < t < T, be a martingale (under IP) 
relative to this filtration. Then there is an adapted process 5(t),0 < t < T, such that 


t 
X(t) =x(0)+ | §(u) dBu),  0<t<T. 
0 
In particular, the paths of X are continuous. 


Remark 18.1 We already know that if X (t) is a process satisfying 
dX (t) = d(t) dB(t), 


then X (t) is a martingale. Now we see that if X (t) is a martingale adapted to the filtration generated 
by the Brownian motion B (t), i.e, the Brownian motion is the only source of randomness in X, then 


dX (t) = 6(t) dB(t) 


for some ô(t). 


18.2 A hedging application 


Homework Problem 4.5. In the context of Girsanov’s Theorem, suppse that F(t), 0 < t < T, is 
the filtration generated by the Brownian motion B (under JP). Suppose that Y is a /P-martingale. 
Then there is an adapted process y(t), 0 < t < T, such that 
t A 
Y(t) =Y(0)+ f y(u) dB(u), 0<t<T. 
0 
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dS(t) = p(t)S(t) dt + o(t)S(t) dB(t), 


Z(t) = exp- f ow) dB(u) -4 f Ph) du}, 


P(A) = | 21) dP, AEF. 


Then 


S(t)\ _ S(t) E 
d (5) = 5090 dB). 


Let A(t),0 < t < T, be a portfolio process. The corresponding wealth process X (t) satisfies 


AD. S(t) x 
d Ga 5 ) = Alot) Fy BO. 
X(t) _ ARE 


Let V be an F(T')-measurable random variable, representing the payoff of a contingent claim at 
time T. We want to choose X (0) and A(t),0 < t < T, so that 


X(T) =V. 


Define the IP-martingale 


IV 
YO =E|7 70]. 0<t<T. 
0 =E | a 70 
According to Homework Problem 4.5, there is an adapted process y(t), 0 < t < T, such that 


Y (t) =¥(0)+ f y dB(u), O0<t<T. 


Set X (0) = Y (0) = E lan and choose A(u) so that 
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With this choice of A(u),0 < u < T, we have 


X(t) | ee ae 

A do) y MEE 
In particular, 

xT) _ gl tie tes 

ay = Laem = way 

X(T) =V. 


The Martingale Representation Theorem guarantees the existence of a hedging portfolio, although 
it does not tell us how to compute it. It also justifies the risk-neutral pricing formula 


xO = 80B |, 70) 


_ LO (40) 
=z a via] 
1 
= E cen ro] o O<e<T, 
where 
Z(t) 
= a 


18.3 d-dimensional Girsanov Theorem 


Theorem 3.57 (d-dimensional Girsanov) e Bt) = (Bi(t),..., Bal), 0 <t < T, a d- 
dimensional Brownian motion on (Q, F, P); 


e F(t),0 <t < T, the accompanying filtration, perhaps larger than the one generated by B; 
e A(t) = (A,(t),... , Oa(t)),0 < t < T, d-dimensional adapted process. 
For 0 < t < T, define 


x t 
Bi) = | SB. FS Teal 


Z) = exp {= [0.804 f 00I du}, 


P(A) = f Z(T) dP. 
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Then, under P, the process 
BOAO RA) DESTE 


is a d-dimensional Brownian motion. 


18.4 d-dimensional Martingale Representation Theorem 


Theorem 4.58 e Bit) = (Bi(t),..., Ba(t)),0 < t < T, a d-dimensional Brownian motion 
on (Q, F, P); 
e F(t),0 <t < T, the filtration generated by the Brownian motion B. 


If X(t),0 < t < T, is a martingale (under IP) relative to F(t),0 < t < T, then there is a 
d-dimensional adpated process 5(t) = (9, (t),... , da(t)), such that 


t 
0) + f §(u).dB(u), 0<t<T. 
0 
Corollary 4.59 Ifwe have a d-dimensional adapted process 0(t) = (01 (t), .. . , @a(t)), then we can 
define B, Z and IP as in Girsanov's Theorem. If Y (t), 0 < t < T, is a martingale under IP relative 


to F(t),0 < t < T, then there is a d-dimensional adpated process y(t) = (y1(t),...., yalt)) such 
that 


0) + [100.430 0<t<T. 


18.5 Multi-dimensional market model 


Let B(t) = (Bi(t),..., Ba(t)), 0 < t < T, be a d-dimensional Brownian motion on some 
(Q, F,P), and let F(t), 0 < t < T, be the filtration generated by B. Then we can define the 
following: 


Stocks 
dS; (t) = pi (t) S: (t) dt + S¿(t OE ro A 


Accumulation factor 


b(t) = exp {fre du}. 


Here, u; (t), ci; (t) and r(t) are adpated processes. 
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Discounted stock prices 


. ; 3 d 
d ea = (m(t) — r(t)) Silt) dt + S(t) doi) dB;(t) 


pa) THOU aa tao 
Risk Premium 
ERA Dolo) [9;(t) + dB;(t)] (5.1) 
~ B(t) = dd — 
dB; (t) 
For 5.1 to be satisfied, we need to choose 8: (t), . . . , g(t), so that 
d 
DA ab) 0, 1= Ly ym. (MPR) 
j=l 


Market price of risk. The market price of risk is an adapted process 0(t) = (01 (t), ... , @a(t)) 
satisfying the system of equations (MPR) above. There are three cases to consider: 


Case I: (Unique Solution). For Lebesgue-almost every t and /P-almost every w, (MPR) has a 
unique solution 0(t). Using @(t) in the d-dimensional Girsanov Theorem, we define a unique 
risk-neutral probability measure P. Under P, every discounted stock price is a martingale. 
Consequently, the discounted wealth process corresponding to any portfolio process is a P- 
martingale, and this implies that the market admits no arbitrage. Finally, the Martingale 
Representation Theorem can be used to show that every contingent claim can be hedged; the 
market is said to be complete. 


Case II: (No solution.) If (MPR) has no solution, then there is no risk-neutral probability measure 
and the market admits arbitrage. 


Case III: (Multiple solutions). If (MPR) has multiple solutions, then there are multiple risk-neutral 
probability measures. The market admits no arbitrage, but there are contingent claims which 
cannot be hedged; the market is said to be incomplete. 


Theorem 5.60 (Fundamental Theorem of Asset Pricing) Part I. (Harrison and Pliska, Martin- 
gales and Stochastic integrals in the theory of continuous trading, Stochastic Proc. and Applications 
11 (1981), pp 215-260.): 

If a market has a risk-neutral probability measure, then it admits no arbitrage. 


Part II. (Harrison and Pliska, A stochastic calculus model of continuous trading: complete markets, 
Stochastic Proc. and Applications 15 (1983), pp 313-316): 
The risk-neutral measure is unique if and only if every contingent claim can be hedged. 
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Chapter 19 


A two-dimensional market model 


Let B(t) = (Bı (t), Bo(t)),0 < t < T, be a two-dimensional Brownian motion on (Q, F, P). Let 
F(t),0 <t < T, be the filtration generated by B. 


In what follows, all processes can depend on t and w, but are adapted to F(t),0 < t < T. To 
simplify notation, we omit the arguments whenever there is no ambiguity. 


Stocks: 
dS, = Sı [u dt + 0, dB], 


dS = So H2 dt + poz dB; + 1- p? 02 db. 


We assume 01 > 0, o2 > 0, —1 < p< 1. Note that 
dSy dS = S?o? dBı dBı = oS? dt, 
dS dS = Se pas, dB, dB, + Sd a pa dB, dBa 
= 655. dt, 
dSy dS = 51015202 dB; dB; = po 0251S dt. 


In other words, 


. E has instantaneous variance 0%, 
e Le has instantaneous variance 03, 


e E and = have instantaneous covariance po 02. 


9) =exp{ | rdu). 


The market price of risk equations are 


Accumulation factor: 


0101 = 11 =r 
(MPR) 
por, + y1- p?a202 = p2 =r 
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The solution to these equations is 


(po. 

01 
g, — AA) POR") 
E 0102/1 — p? i 


provided —1 < p < 1. 
Suppose —1 < p < 1. Then (MPR) has a unique solution (81, 62); we define 


t t t 
n= epf- f 0, dB, -f diles if (62 + 62) du}, 
0 0 0 
P(A) = f Z(T) dP, VYAEF. 
A 
P is the unique risk-neutral measure. Define 
t 
A E f 6, du + By(t), 
0 


Bao = f oz du + Ba(t). 


Then 


dS; = Sı [r dt +o4 dB, | 


dS = So b dt + po2 dB, + 1- pod E 


We have changed the mean rates of return of the stock prices, but not the variances and covariances. 


19.1 Hedging when —1 < p< 1 


dX = Ay dSy + Ao dS r(x Ası A282) dt 
d (=) eee a) 


BI É 
= Sands, E rS1 dt) + Alas, = rSo dt) 


1 ~ 1 x ty 
= JSI dBı + gA? fpo: dBı + 1- po TA A 


Let V be F(T)-measurable. Define the P-martingale 
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The Martingale Representation Corollary implies 


t ye t a 
¥(Q=¥(0)+ f vdt | dB. 
0 0 
We have 


X 1 1 ~ 
d (5) = (Faso if 5 A2Sap2) dB, 


1 ` 
+ gary 1 — p?02 dBa, 


dY = V1 dB, + Y2 dBa. 


We solve the equations 


1 1 
gota + B 


1 
gay 1S p02 = 72 


for the hedging portfolio (A, A2). With this choice of (A1, A2) and setting 


A252p02 = Y1 


= V 
XO =Y(0) = Exp, 


we have X (t) = Y(t), 0 < t < T, and in particular, 
X(T) =V. 


Every F(T')-measurable random variable can be hedged; the market is complete. 


19.2 Hedging when p = 1 


The case p = —1 is analogous. Assume that p = 1. Then 


dS = Sy [py dt + O71 dB, | 
dS = Sala dt+ 02 dB, | 


The stocks are perfectly correlated. 


The market price of risk equations are 


010 = ju =r (MPR) 
0201 = fig =r 


The process 0, is free. There are two cases: 
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Case I: 27 4 “2. There is no solution to (MPR), and consequently, there is no risk-neutral 
measure. This market admits arbitrage. Indeed 


X 1 1 
d (5) = Be = rsi dt) + goles + rSa dt) 
1 1 
= gal = r) dt + O71 dB;] + ¿sal = r) dt + O2 dBı] 
Suppose 2 > £—. Set 
1 1 
Ai = —, A=- 
j 0151? z 0993 
Then 
X 1 [u -r 1 [u2 =r 
W e n e an 
8) Bl o Ap ple g j 
-7 [Z-t a 
Pu a 
Positive 


Case II: 4 = “4—. The market price of risk equations 


010, = pr 
0201 = p2 =r 
have the solution 


por far 
91 = —— = —— 
O71 02 


7 


9, is free; there are infinitely many risk-neutral measures. Let IP be one of them. 
Hedging: 
X 1 1 
d (=) = galls —r) dt+o, dBı] + gal -= r) dt +o dB] 
1 1 
= gore dt + dB, | + eee dt + dB, | 


1 1 ~ 
= (0150 + 525202) dBi. 


Notice that By does not appear. 


Let V be an F(T')-measurable random variable. If V depends on Ba, then it can probably not 
be hedged. For example, if 
V =A(Si(T), 52(T)), 


and cı or 02 depend on Bg, then there is trouble. 
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More precisely, we define the IP-martingale 


Ye =È En 


We can write 


t E t E 
Y(09)=Y00)+ [ ndbi+ | y dB, 
0 0 
so 


dY = V1 dB, + V2 dBa. 
To get d (2) to match dY , we must have 


v2 = 0. 


Fo), 0<t<T. 
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Chapter 20 


Pricing Exotic Options 


20.1 Reflection principle for Brownian motion 


Without drift. 
Define 


Then we have: 


IP{M(T) >m, B(T) <b} 


So the joint density is 


2 1 00 
IP{M(T) € dm, B(T) € db) = E (= f o) ir) dm db 


ð 1 C 
Sa dm db 
in (amen {ef am at 


— —b 
OE ) exp e dm db, m>0,b< m. 
TyY2rT 2T 


With drift. Let 
B(t) =0t+B(t), 
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2m-b | ; shadow path 


Brownian motion 


Figure 20.1: Reflection Principle for Brownian motion without drift 


m=b 


(B(T), M(T)) lies in here 


Figure 20.2: Possible values of B(T), M (T). 
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where B(t), 0 < t < T, is a Brownian motion (without drift) on (Q, F, P). Define 


Z(T) = exp{-0B(T) — 40°T} 
= exp{—0(B(T) + 6T) + 40°T} 
= exp{—0B(t) + 4@°T}, 

P(A) = [2@ dP, YAEF. 


SetM(T) = maxo<<r B(T). 


Under P, B is a Brownian motion (without drift), so 


a — an Aà2m-— b) Qm—b)? a « - 
IP{M(T) € dm, B(T) € db} = 22 -< $ dñ dÌ, >0, b< ñ. 
{M(T) € dm, B(T) € db} T/T exp T m m m 


Since h is arbitrary, we conclude that 


(MPR) 


IP{M(T) € dm, B(T) € db} 
= exp{6b — 46°T} IP{M(T) € dñ, B(T) € db} 


2(2m — b) (2m — by? ae A ae pt 
= == exp 3 —- > p .exp10b — 50 Tyd db, m>0,b<m. 
Tank P| OT pt 2 } 
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20.2 Up and out European call. 


Let 0 < K < L be given. The payoff at time T is 
(S(T) — K) lyse ry <x}; 


where 
S*(T) = Pao 


To simplify notation, assume that JP is already the risk-neutral measure, so the value at time zero of 
the option is 


v(0, 5(0)) =e" B |(S(T) — K)*1gscry<x3] - 
Because /P is the risk-neutral measure, 


dS(t) =rS(t) dt + 0S(t) dB(t) 
S(t) = Soexp{oB(t) + (r — $07)t} 
= Soexp 4 o | B(t) + (2 - z) t 
= So exp{oB(t)}, 


= (5-2) 
a 2 


B(t) = 6t + B(t). 


where 


Consequently, 
S*(t) = Soexp{oM(t)}, 


where, 


We compute, 


v(0, 5(0)) =e" E |(S(T) — K) 1p sery cry 


—rT NE 
ee (sc )exp{oB(T)} — K) Lisowo ña n) 
= “TB (5(0) ) exp{o B(T)} — K) Le 1 Ko os 1 L | 
{Bays log S0’ MUIS log S0) } 
— a — a 


b m 
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X 
B(T) 


Figure 20.3: Possible values of B(T), M (T). 


We consider only the case 

S(0)<K<L, so 0<b<m. 
The other case, K < S(0) < L leads to b < 0 < ñ and the analysis is similar. 
We compute je J” ...dy de: 


gz 


v(0, S(0)) =e"? F [vo exp{ax}— > exp a + Ox — er} dy dx 


Seer f(s pa (2y - 2)? ga 10T a? 
=e A (S(0) explox) — oras RO x — 4 P 
y=x 
2 
= [mentes 1) elon | y + -er 
2 

-epf PR +00 487) dx 
Be SG is a 0 -18TL d 
DT ey exp ar- pT t-35 x 


1 -rT ai g’ 192 
— ——e "K exp 4 —— +02 — s0°T > de 
rT 5 P| 2T z 


l T E (2m — x)? 192 
= e '* S(0 exp 4 oz — ~————*~ + 62 — 50T > dz 
OnT (0) f P| oT 2 
Loa [Pf may ” 
took] exp 4 —~————— + 6x — <6°T > dz. 
rT 5 rf 2T A 


The standard method for all these integrals is to complete the square in the exponent and then 
recognize a cumulative normal distribution. We carry out the details for the first integral and just 
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give the result for the other three. The exponent in the first integrand is 


1 

= apt oT eT)? l Lo*T FOOT 
1 rT TN?’ 

eee ans Se T 
or ( g =) ar 


In the first integral we make the change of variable 


y=(2-rT/0-oT/2/VT, dy = dz / vT, 


to obtain 


—rT m 2 
=] ep for- Gp ta 4001] dz 
b 


v2rT 2T 
1 m 1 rT oT\? 
=r) eo-a (2-2-2) ) di 
ñm _rVT_ ON T 
VF oO. 2 
= se. f etiw 
ae BO rYT_OYT ó 
Vr 0 2 
or [e ee Yc gif ct EN, CONE 
VT o 2 VT a 2 


coro phaea) o (q - 572] 
ee eee) nl) 
ON sta ne 


+exp{-rP 42m (2-2) [x (Z227) E 


VT e] 2 
where 
Prince hin. eich Mane 
O Ta SO): 
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(Tx) = (x - KJ 


v(t,0) = O T 


Figure 20.4: Initial and boundary conditions. 


If we let L—>00 we obtain the classical Black-Scholes formula 


b ryT oVT 
pT a b rJ/T ovT 
etn i-w( 4-87, 0) 


= S(0)N (ro PoE, z) 


oVT K oO 2 
e RON > log sW + oe ee 7) ; 
ovT K o 2 


If we replace T by T — t and replace S(0) by x in the formula for v(0,.5(0)), we obtain a formula 
for v(t, z), the value of the option at the time t if S(t) = x. We have actually derived the formula 
under the assumption x < K < L, but a similar albeit longer formula can also be derived for 
K < x < L. We consider the function 


v(t,1) = E7 le") (S(T) = K)*1s-(ry<13| , 0<t<T,0<2<L. 
This function satisfies the terminal condition 
v(T,æ)= (z—- K)", 0<z<L 


and the boundary conditions 


We show that v satisfies the Black-Scholes equation 


rutuytrev; 4 $072" Ure, 0<t<T,O0<a< FL. 
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Let S(0) > 0 be given and define the stopping time 
T= min{t > 0; S(t) = L}. 
Theorem 2.61 The process 
eTDn(tAT, S(EAT)), 0<t<T, 
is a martingale. 


Proof: First note that 
S*(T) <he37>f7. 


Let w € Q be given, and choose t € [0, T]. If 7 (w) < t, then 


TE e? (S(T) = K) 1 esr) <r} 


F| oy =. 
But when 7(w) < t, we have 
v(tAr(w), S(EAT(w),w)) = v(t A rw), L) =0, 


so we may write 


IE e™T(S(T) = Lis ma 


F| (w) = eT) (tA rlw), SEA T(0), 60) 
On the other hand, if 7 (w) > t, then the Markov property implies 


IE 


e (S(T) — K)* lisi F o] (~) 
nw (US) ienen] 
= e™*v(t, S(t,w)) 

=e TM) (EAT, SA Tw), w)). 


In both cases, we have 


eTeDY(tEAT, S(tAT)) = JE 


eT (S(T) — K)*1 «(ry <r} 


F| l 
Suppose 0 < u < t < T. Then 


IE Jenene AT, SAT) F(u) 


= JE E e? (S(T) = DO 1 gsr) <r} 


Ft] [Fo] 
F(u)| 


= Ele" (S(T) — K)*1ysecryery 


= eT!) (u AT, S(UAT)). 
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For 0 < t < T, we compute the differential 


d (e-Tv(t, $())) =e" (—rv + vi + rSve + 1078?) de + "oS vy dB. 
Integrate from 0 tot A 7T: 


eT (bar, S(EA T)) = (0, S(0)) 


tat 
= 122 
+f "(—ro + 04 + 180 4 3008 Vss) du 


tAT 
+ f e "oSv, dB. 
0 
A stopped martingale is still a martingale 


Because e~"'7)u (t A 7, S(t A 7)) is also a martingale, the Riemann integral 


tArT 
f e“(-rutut+rSvz4 10? S vss) du 
is a martingale. Therefore, 
—rv(u, S(u)) + vilu, S(u)) + rS(u)ve(u, S(u)) + $07S?(u)vre(u, S(u))=0, O< u<tAr. 


The PDE 
rutuytrev;4 =D, O<t<T,0<ze<L, 


then follows. 


The Hedge 
d(eto(t,S(0)) = eo S(t)ue(t, S() dB), 0<t<r 


Let X (t) be the wealth process corresponding to some portfolio A(t). Then 


d(e"'X (t)) =e" A(t)a S(t) dB(t). 


We should take 
X (0) = v(0, 5(0)) 
and 
A(t) (G50), DELTA 
Then 


X(T AT) =v(T AT, S(TAZ)) 
JUE, S(T) = (S(T) - K)* ifr >T 
= v(t, L)=0 ifr <T. 
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v(T, x) 

0 K L x 
v(t, x) 

0 K L x 


Figure 20.5: Practial issue. 


20.3 A practical issue 


For t < T but t near T, v(t, x) has the form shown in the bottom part of Fig. 20.5. 
In particular, the hedging portfolio 
A(t) = v(t, 5(t)) 


can become very negative near the knockout boundary. The hedger is in an unstable situation. He 
should take a large short position in the stock. If the stock does not cross the barrier L, he covers 
this short position with funds from the money market, pays off the option, and is left with zero. If 
the stock moves across the barrier, he is now in a region of A(t) = v,(t, S (t)) near zero. He should 
cover his short position with the money market. This is more expensive than before, because the 
stock price has risen, and consequently he is left with no money. However, the option has “knocked 
out”, so no money is needed to pay it off. 


Because a large short position is being taken, a small error in hedging can create a significant effect. 
Here is a possible resolution. 


Rather than using the boundary condition 
v(t, L)=0, 0< t.< T, 
solve the PDE with the boundary condition 
v(t, L) + aL, (t,L)=0, 0<t<T, 


where a is a “tolerance parameter”, say 1%. At the boundary, Lv,(t, L) is the dollar size of the 
short position. The new boundary condition guarantees: 


1. Lv,(t, L) remains bounded; 


2. The value of the portfolio is always sufficient to cover a hedging error of a times the dollar 
size of the short position. 


Chapter 21 


Asian Options 


Stock: 
dS(t) =rS(t) dt + 0S(t) dB(t). 


van( (050 4) 


Payoff: 


Value of the payoff at time zero: 


X(0) =E lem (Eso a) 


Introduce an auxiliary process Y (t) by specifying 


With the initial conditions 


we have the solutions 


Define the undiscounted expected payoff 
u(t, æ, y) = BAY (T)), O<t<T,2>0, ye R. 
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21.1 Feynman-Kac Theorem 


The function u satisfies the PDE 
Ut FU + 40727 Urr + Ty =0, 0<t<T, 2 >0, yc R, 


the terminal condition 
u(T,a,y)=h(y), 21>0,yedR, 


and the boundary condition 


u(t,0,y) =h(y), 0<t<T, ye dR. 


è (s0. f sw) du) 


v(t, z, y) = erat HU). 


One can solve this equation. Then 


is the option value at time ¢, where 


The PDE for v is 


ruto,t+ rev, + $072 Urs + 20y = 0, (1.1) 


v(T, x,y) = hly), 
v(t,0,y) = e" (1-0 (y). 


One can solve this equation rather than the equation for u. 


21.2 Constructing the hedge 


Start with the stock price 5(0). The differential of the value X (t) of a portfolio A(t) is 


dX =AdS+r(X — AS) dt 
= AS(r dt +o dB)+rxX dt —rAS dt 
=AcSdB+rX dt. 


We want to have 


so that 
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The differential of the value of the option is 


t 
du (i, st), | S(u) du) = vidt + vd + vyS dt + tur. dS dS 
0 


= (v; + 1807 + Svy 4 09 Dag) dt + oSv; dB 
= ro(t, S(t)) dt + vz(t, S(t)) o S(t) dB(t). (From Eq. 1.1) 


Compare this with 
dX (t) =rX(t) dt + A(t) o S(t) dB(t). 


Take A(t) = vz (t, $(t)). If X(0) = v(0, S(0), 0), then 


t 
iS (+ st), f S(u) du) <tc, 
0 
because both these processes satisfy the same stochastic differential equation, starting from the same 


initial condition. 


21.3 Partial average payoff Asian option 


T 
v=1(/ 50) at), 
where 0 < 7 < T. We compute 


u(r, æ, y) = Ee U-DAY (T)) 


Now suppose the payoff is 


just as before. For 0 < ¢ < 7, we compute next the value of a derivative security which pays off 
v(t, S(T), 0) 
at time 7. This value is 
w(t, £) = Ete" u(r, S(r),0). 
The function w satisfies the Black-Scholes PDE 


rw+wetrew, 4 BO eWay 0, DECESO, 


with terminal condition 


and boundary condition 


The hedge is given by 
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Remark 21.1 While no closed-form for the Asian option price is known, the Laplace transform (in 
the variable orp — t)) has been computed. See H. Geman and M. Yor, Bessel processes, Asian 
options, and perpetuities, Math. Finance 3 (1993), 349-375. 


Chapter 22 


Summary of Arbitrage Pricing Theory 


A simple European derivative security makes a random payment at a time fixed in advance. The 
value at time t of such a security is the amount of wealth needed at time ¢ in order to replicate the 
security by trading in the market. The hedging portfolio is a specification of how to do this trading. 


22.1 Binomial model, Hedging Portfolio 


Let 22 be the set of all possible sequences of n coin-tosses. We have no probabilities at this point. 
Letr > 0, u >r+1, d= 1/u be given. (See Fig. 2.1) 


Evolution of the value of a portfolio: 
Xka = AkSkpa + (14 r)(Xk — Ap Sp). 


Given a simple European derivative security V (w1, w2), we want to start with a nonrandom Xo and 
use a portfolio processes 


Ao, Ar(H), A(T) 


so that 


X2(w1,w2) = V(w1,w2) Vwi,we. (four equations) 


There are four unknowns: Xo, Ao, A1(4), A(T). Solving the equations, we obtain: 
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1 1+ T+ 
X1(w1) = er — X9(w1, H) + aces Xalo T), 
V(01,H) V(w1,T) 
= 1 fl+r-d u—(1+r) 
TES u—d aE) cr u—d O 
_ X2(01,H) — X2(w1,T) 
Dae) Salw1, H) — S2(w1, T) 
a, Aa) - X10) 
° — Si (H) ST) 


The probabilities of the stock price paths are irrelevant, because we have a hedge which works on 
every path. From a practical point of view, what matters is that the paths in the model include all 
the possibilities. We want to find a description of the paths in the model. They all have the property 


2 
(log Sis — log Si)? = (log E+) 
k 


= (+log u)? 
= (log u)’. 
Let o = log u > 0. Then 
n—-1 
> (log Sk+1 — log Si) =o’n. 
k=0 


The paths of log Sy, accumulate quadratic variation at rate 0? per unit time. 


If we change u, then we change o, and the pricing and hedging formulas on the previous page will 
give different results. 


We reiterate that the probabilities are only introduced as an aid to understanding and computation. 
Recall: 


Xk41 = ApSega + (1+ 1) (X;, — AgSg). 


Define 
Br =(1+r)* 
Then 
Xk+ Agee Pé REL 
Pri Prri Êk p 
i.e., 


Xk Xk = (2 >) 
Brrr Br Pa. es” 


In continuous time, we will have the analogous equation 


d (39) ZAGA (o) 
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If we introduce a probability measure PP under which Dk is a martingale, then As will also be a 
martingale, regardless of the portfolio used. Indeed, 


Fa] = [Fe + ae o) |e 
-3ta (E 


X k41 
Br+1 


= 


Suppose we want to have X = V, where V is some F2-measurable random variable. Then we 
must have 


ring Bla PEL 
x= 221%] 51. 


To find the risk-neutral probability measure IP under which Dk is a martingale, we denote p = 
Piw, = H}, G = P{w, = T}, and compute 


= [= 


Sk S 
+ qd 
Prot 


q= 
Bk+1 Pr+1 


Fa = pu 


We need to choose p and q so that 


pu+qd=1+r, 
p+q=l. 


The solution of these equations is 


22.2 Setting up the continuous model 


Now the stock price S(t),0 < t < T, is a continuous function of t. We would like to hedge 
along every possible path of S(t), but that is impossible. Using the binomial model as a guide, we 
choose o > 0 and try to hedge along every path S(t) for which the quadratic variation of log S (t) 
accumulates at rate o? per unit time. These are the paths with volatility a?. 


To generate these paths, we use Brownian motion, rather than coin-tossing. To introduce Brownian 
motion, we need a probability measure. However, the only thing about this probability measure 
which ultimately matters is the set of paths to which it assigns probability zero. 
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Let B(t),0 < t < T, be a Brownian motion defined on a probability space (Q, F, P). For any 
p € IR, the paths of 


pt + oB(t) 
accumulate quadratic variation at rate o° per unit time. We want to define 
S(t) = $(0) expfpt +0 B(0)), 


so that the paths of 
log S(t) = log S(0) + pt + oB(t) 


accumulate quadratic variation at rate 0? per unit time. Surprisingly, the choice of p in this definition 
is irrelevant. Roughly, the reason for this is the following: Choose w; € Q. Then, for pı € RR, 


pit+oB(t,w1), O<t<T, 


is a continuous function of t. If we replace pı by p2, then pot + o B(t, w1) is a different function. 
However, there is an wg € (2 such that 


pit +o B(t,w1) = pot +o Blt,w2), O<t<T. 


In other words, regardless of whether we use pı or p2 in the definition of S(t), we will see the same 
paths. The mathematically precise statement is the following: 


If a set of stock price paths has a positive probability when S(t) is defined by 
S(t) = S(0) exp{pit +0B(0)), 

then this set of paths has positive probability when S(+) is defined by 
S(t) = S(0) exp{pot + a B(t)}. 


Since we are interested in hedging along every path, except possibly for a set of paths 
which has probability zero, the choice of p is irrelevant. 


The most convenient choice of p is 


= 12 
p=r— 57 , 


S(t) = $(0) exp{rt + oB(t) — 50%), 


and 
e™ S(t) = $(0) exp{oB(t) — $07t} 


is a martingale under JP. With this choice of p, 


dS(t) =rS(t) dt + 0S (t) dB(t) 
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and JP is the risk-neutral measure. If a different choice of p is made, we have 


S(t) = 5(0) exp{pt + oB(i)}, 
dS(t) = (p+ $07) S(t) dt + oS(t) dB(t). 


H 
= rS(t) dt + o ("dt +dB(0)]. 
e 

dB(t) 
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B has the same paths as B. We can change to the risk-neutral measure IP, under which B is a 


Brownian motion, and then proceed as if p had been chosen to be equal tor — 5 


22.3 Risk-neutral pricing and hedging 


Let JP denote the risk-neutral measure. Then 
dS(t) = rS(t) dt + oS (t) dB(t), 


where B is a Brownian motion under IP. Set 


b(t) =e" 
Then S(t) s(t) 
d (o) = ay iB) 


so za is a martingale under P. 
Evolution of the value of a portfolio: 
dX (t) = A(t)dS(t) + r( X(t) — A(t) S(t)) dt, 


which is equivalent to 


i) 
B(t) 


Regardless of the portfolio used, aa is a martingale under P. 


= A(t) dB(t). 


(3.1) 


(3.2) 


Now suppose V is a given F(T')-measurable random variable, the payoff of a simple European 
derivative security. We want to find the portfolio process A(T),0 < t < T, and initial portfolio 


value X (0) so that X (T) = V. Because ZR must be a martingale, we must have 


o Elan rol. EA 


This is the risk-neutral pricing formula. We have the following sequence: 


(3.3) 
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1. V is given, 
2. Define X (t), 0 < t < T, by (3.3) (not by (3.1) or (3.2), because we do not yet have A(t)). 


3. Construct A(t) so that (3.2) (or equivalently, (3.1)) is satisfied by the X(t),0 < t < T, 
defined in step 2. 


To carry out step 3, we first use the tower property to show that aH defined by (3.3) is a martingale 


under IP. We next use the corollary to the Martingale Representation Theorem (Homework Problem 
4.5) to show that 


d (=) = y(t) dB(t) (3.4) 


for some proecss y. Comparing (3.4), which we know, and (3.2), which we want, we decide to 
define 


A(t) = ne. (3.5) 


Then (3.4) implies (3.2), which implies (3.1), which implies that X (t),0 < t < T, is the value of 
the portfolio process A(t),0 <t <T. 


From (3.3), the definition of X , we see that the hedging portfolio must begin with value 


x(0) = E ES 


and it will end with value 


V | V 

— IF (T) = PM) = V. 

an] =D 

Remark 22.1 Although we have taken r and ø to be constant, the risk-neutral pricing formula is 
still “valid” when r and o are processes adapted to the filtration generated by B. If they depend on 
either B or on S, they are adapted to the filtration generated by B. The “validity” of the risk-neutral 
pricing formula means: 


X(T) =9(1)E | 


1. If you start with 


then there is a hedging portfolio A(t), 0 < t < T, such that X (T) = V; 
2. At each time t, the value X (t) of the hedging portfolio in 1 satisfies 


AN) Faia 


plo LBT) 


Remark 22.2 In general, when there are multiple assets and/or multiple Brownian motions, the 
risk-neutral pricing formula is valid provided there is a unique risk-neutral measure. A probability 
measure is said to be risk-neutral provided 
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e it has the same probability-zero sets as the original measure; 


e it makes all the discounted asset prices be martingales. 


To see if the risk-neutral measure is unique, compute the differential of all discounted asset prices 
and check if there is more than one way to define B so that all these differentials have only dB 
terms. 


22.4 Implementation of risk-neutral pricing and hedging 


To get a computable result from the general risk-neutral pricing formula 


0 baal): 


one uses the Markov property. We need to identify some state variables, the stock price and possibly 
other variables, so that 


V 
x(0)=805 |, [F00] 
is a function of these variables. 


Example 22.1 Assume r and o are constant, and V = A(S(T')). We can take the stock price to be the state 
variable. Define 


v(t, e) = E" Je"? R(S(T))| 


Then 
X(t) =e" E O) 
= v(t, S(t)), 
and au = e~"ty(t, S(t)) is a martingale under P. a 


Example 22.2 Assume r and ø are constant. 


vaa( {sw ol 


Take S(t) and Y (t) = f S(u) du to be the state variables. Define 


0 


~ try 


v(t,2,y) = E [TD A(Y(T))| > 


where 


Ym =s f S(u) du. 
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Then 
X) =e E | "stay | to] 
= v(t, S(t), Y (8) 
and x(t) 
a = e~"*u(t, S(t), Y (t)) 
is a martingale under P. E 


Example 22.3 (Homework problem 4.2) 
dS(t) = r(t, Y (0) S(t)dt + o(t, Y (0)5(0) dB(t), 
dY (t) = a(t, Y(t)) dt + y(t,Y (0) dB(0), 
V = h(S(T)). 
Take S(t) and Y (t) to be the state variables. Define 


v(t,2,y)- BE" sf- f r(u, Y (u)) eu} SD) 
S ea 


ea 
Then 
xo = po [Ero] 
= č few- , r(u, Y (u)) au buseo Eo] 
= v(t, S(t), Y (t)), 
and 
a) = ex = a u u UPD 
ZO = ex {- f ruro) dub oe s0. Y (0) 
is a martingale under P. E 


In every case, we get an expression involving v to be a martingale. We take the differential and 
set the dt term to zero. This gives us a partial differential equation for v, and this equation must 
hold wherever the state processes can be. The dB term in the differential of the equation is the 
differential of a martingale, and since the martingale is 


X(t) _ ties OD yen 
Go Ot [| Au) Fay Bt) 


we can solve for A(t). This is the argument which uses (3.4) to obtain (3.5). 
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Example 22.4 (Continuation of Example 22.3) 


a2 a l- TET du} v(t, S(t), ¥ (0) 
AT eee 


is a martingale under IP. We have 


a!) A —r v 
a (29) = 70 | (1, Yol, SH), Y (0) de 


+ vedt + ved + vydY 


+ durrdS dS + v2ydS dY + tvyydY dY 


= an ry +, $rSvy + avy + 40° S’ ure + OYSU zy + $7" yy) dt 
+ (o Sve + Wy) dB 


The partial differential equation satisfied by v is 


22 152 
L Ure + OYLU Ey + ZY Yyy = 0 


TU + 04 4 PLUL + QUy +4 Lo 


where it should be noted that v = v(t, z, y), and all other variables are functions of (t, y). We have 


a) = d v v. B 
y) = ppt a8. 


teen a(t, Y(t)),y = y(t, Y (t)), v = v(t, S(t), Y (t)), and S = S(t). We want to choose A(t) so that 
| X()) _ SW) ap 
d (50) = A(t)o(t, Y (4) dB(t). 


Therefore, we should take A(t) to be 


A(t) = vz (t, S(t), Y (0) + 
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Chapter 23 


Recognizing a Brownian Motion 


Theorem 0.62 (Levy) Let B(t),0 < t < T, be a process on (Q,F,P), adapted to a filtration 
F(t),0 <t <T, such that: 


1. the paths of B(t) are continuous, 

2. B is a martingale, 

3. (B)(t) =t,0 < t < T, (ie, informally dB (t) dB(t) = dt). 
Then B is a Brownian motion. 
Proof: (Idea) Let 0 < s < t < T be given. We need to show that B(t) — B(s) is normal, with 
mean zero and variance t — s, and B(t) — B(s) is independent of F(s). We shall show that the 


conditional moment generating function of B(t) — B(s) is 


E [err 0-B0) 


l 2 
Fis) Sep) 


Since the moment generating function characterizes the distribution, this shows that B(t) — B(s) 
is normal with mean O and variance t — s, and conditioning on F(s) does not affect this, i.e., 
B(t) — B(s) is independent of F(s). 


We compute (this uses the continuity condition (1) of the theorem) 


de BO = ue BOIB(O) + 4u2e"POdB(t) dB(t), 
so 


t t 
et Blt) = etB(s) al f uet B) dB(v) sit Lu? f et B(v) do. 
j j uses cond. 3 
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Now fj we“? )dB(v) is a martingale (by condition 2), and so 


t 
JE i ue"B) dB(v) 


Ss 


Fs) 


s t 
= - | ue"PdB(v) + E / ue"BOdB(v) 
0 0 


Fs) 


It follows that 


reto lr] = 204 be feroz) av 
We define 
po) = E [BO (73, 
so that 
g(s) =P) 
and 


Plugging in s, we get 


Therefore, 


CHAPTER 23. Recognizing a Brownian Motion 
23.1 Identifying volatility and correlation 


Let Bı and By be independent Brownian motions and 


dS 
la =rdt+ O11 dB; + 012 dB, 
Sy 
dS 
aia =rdt+ 021 dBı + 0922 dB, 
S2 
Define 
oi = Vaii + 0%, 
02 = 93 + 0%, 
pa 011021 + 012022 


0102 
Define processes W and Wa by 
011 dB, + 012 dB) 
01 
021 dB, + 073 dB) 
O2 à 


dW, = 


dW = 


Then W; and Wa have continuous paths, are martingales, and 


1 
dW, dW, = Gz (oud Bi + o42dB2)* 
1 
1 
= (oid Bi dB; + o7,d Bo dB) 
1 
= di, 
and similarly 
dW. dW. = dt. 
Therefore, W; and Wz are Brownian motions. The stock prices have the representation 
dS 
— =rdt+01 dW), 
1 
d 
da =rdt+ 02 dW. 


2 
The Brownian motions W, and W% are correlated. Indeed, 


1 
dW, dW = 0109 (oid Bı + 012d Bz) (od By, + 022d B2) 


1 
= (041021 + 012022) dt 
0102 


= p dt. 
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23.2 Reversing the process 


Suppose we are given that 


ees =r dt + oidW,, 
d 

a =rdt+oodWo, 
52 


where W, and W, are Brownian motions with correlation coefficient p. We want to find 


o o 
y = 11 12 
O21 922 
so that 


Ny = E T k 2 


O21 022| |712 922 
Oe fa 9 T 
= 011 7012 011021 + 012022 
= 2 2 
011021 + 012022 721 T 722 


es ot Po 102 
po 1o2 on 
A simple (but not unique) solution is (see Chapter 19) 


O11 = 91, 012 = 0, 


O21 = P92, 072 = /1- p? O2. 


This corresponds to 
cı dW, = 01dB;=>dB; = dW, 


02 dW, = por dB; + 1 — p?o2 dBa 


dW, — p dW, 
He, (p # £1) 
l-p 


= dB = 


If p = +1, then there is no B and dW2 = p dB, = p dW. 


Continuing in the case p # +1, we have 


dB, dB, = dW, dW, = dt, 


1 
dB, dBy == (a, dW2 — 2p dW, dW + p?dW, dW2) 


=" — (at — 2p dt + p? dt) 


= dt, 
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so both Bı and By are Brownian motions. Furthermore, 


1 


AA! 
1 


dB; dB) = dW, dW = pdW, dW) 


We can now apply an Extension of Levy’s Theorem that says that Brownian motions with zero 
cross-variation are independent, to conclude that B1, B2 are independent Brownians. 
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Chapter 24 


An outside barrier option 


Barrier process: 


=A dt + O71 dB, (t). 


Stock process: 


S 
GA TH dt + poz dBy(t) + 1/1 — p? o2 dBa(t), 


where 01 > 0, o2 > 0, —1 < p< 1, and B; and By are independent Brownian motions on some 
(Q, F, P). The option pays off: 

(S(T) — E) LTL} 
at time 7’, where 


0<S(0)< K, 0<Y(0)< L, 


Remark 24.1 The option payoff depends on both the Y and S processes. In order to hedge it, we 
will need the money market and two other assets, which we take to be Y and S. The risk-neutral 
measure must make the discounted value of every traded asset be a martingale, which in this case 
means the discounted Y and § processes. 


We want to find 9, and 92 and define 


dB, = 0; dt+dB,, dB = bz dt +dBa, 
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so that 
dY ~ 
y =r dt+ oidB; 
=rdt+ 0101 dt + O71 dB, 


d pa ~ 
E =r dt + poz dBy + y1- p° o2dBy 


= r dt + poz 6, dt + 1/1 — p? 0928, dt 
+ po2 dB, + 1 — p? 092 dBos. 


We must have 


A=r+010,, (0.1) 
p=r+p0201 +4/1— p? 0283. (0.2) 
We solve to get 
9, = A — 3 
01 
9, — Han pant 


Vl- p? 02 l 


We shall see that the formulas for 9, and 9 do not matter. What matters is that (0.1) and (0.2) 
uniquely determine 9, and 92. This implies the existence and uniqueness of the risk-neutral measure. 
We define 


Z(T) = exp {-6, B1 (T) — 62B2(T) — 4(6 + 63)T}, 


P(A) = / Z(T) dP, YAEF. 


Under IP, B, and B are independent Brownian motions (Girsanov’s Theorem). IP is the unique 
risk-neutral measure. 


Remark 24.2 Under both IP and IP, Y has volatility 71, S has volatility oz and 


dY ds di 
—— = poo 

YS P 1 2 E 
: : dY dS ; 
i.e., the correlation between $ and “> is p. 
The value of the option at time zero is 


»(0,.5(0),¥(0)) = E [eT (S(T) — K) (ry <1)) - 


We need to work out a density which permits us to compute the right-hand side. 
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Recall that the barrier process is 


dY ~ 
PAA dBı, 


so 


Y (t) = Y (0) exp {rt + o Bı (t) — lort) : 


Set 
6=r/o, — 01/2, 
B(t) = 6t + By (0), 
M(T) = max, BO) 
Then 


Y (t) = Y (0) exp{o1B(t)}, 
Y*(T) = Y (0) exp{o,M(T)}. 


The joint density of B(T) and M(T), appearing in Chapter 20, is 
IP{B(T) € db, M(T) € din} 


_22m-)) (2r — dy 192 ee 
= — + 6) — LPT) db din, 
TV2nT pf 2T a 


m > 0, b<m. 
The stock process. 
e =rdt+ po2dB, + 4/1-— p? od Bə, 
so 
S(T) = S(0) exp{rT + po2B1 (T) — tp soe y1- oB (T 
= S (0) exp{rT — 403T + po, B,(T) +4/1 — p? o2B,(T)} 
From the above paragraph we have 
B(T) = -6T + B(T), 
so 


S(T) = S (0) exp{rT + po2B(T) — $057 — po26T + 4/1 — p? o2B(T)} 


=p”) 


031) 
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24.1 Computing the option value 


(0, (0), Y (0)) = E [eT (S(T) — K)+ Lyer] 


E s, ~ oe + 
= TE (5 (0) exp { œ- $03 - p®)T + ponB(L) + y1- BaT) } - K) 


Lv (o) | 


We know the joint density of (B(T), M(T)). The density of B(T) is 
IP{B(T) € db} = E a db, bER 
2 = oT P T > i 


Furthermore, the pair of random variables (B(T), M(T)) is independent of B, (T) because By a and 
Bz are independent under IP. Therefore, the joint density of the random vector (B2(T), B(T), M(T)) 
is 


IP{B,(T) € db, B(T) € db, M(T) € din, } = P{ B(T) € db}. IP{B(T) € db, M(T) € din} 
The option value at time zero is 


v(0, 5(0), Y(0)) 


1 L 
UT log YO) A 0 


ue a y + 
=T ff | (scoyexp{ $03 - pord)t pot y pot} x) 


«db db din. 
The answer depends on T, 5(0) and Y (0). It also depends on 01,02, p,r, K and L. It does not 
depend on A, u, 8, nor 6). The parameter @ appearing in the answer is 0 = ar” a. 


Remark 24.3 If we had not regarded Y as a traded asset, then we would not have tried to set its 
mean return equal to r. We would have had only one equation (see Eqs (0.1),(0.2)) 


u =r + por; + 1/1 — p? 02, (1.1) 


to determine 6, and @2. The nonuniqueness of the solution alerts us that some options cannot be 
hedged. Indeed, any option whose payoff depends on Y cannot be hedged when we are allowed to 
trade only in the stock. 
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If we have an option whose payoff depends only on S, then Y is superfluous. Returning to the 
original equation for S, 


d 
E = pat + por dB, + 4/1 — p? 029 dBa, 
we should set 
dW = P dBı + 1- p?dBao, 
so W is a Brownian motion under /P (Levy’s theorem), and 


d 
2 = pdt + oodW. 


Now we have only Brownian motion, there will be only one 9, namely, 


a 
02 


so with dW = 9 dt + dW, we have 
d E) 
e =r dt + o dW, 


and we are on our way. 


24.2 The PDE for the outside barrier option 


Returning to the case of the option with payoff 
(S(T) — K)*1 gener}, 
we obtain a formula for 
OE Cy) = ert (S(7) — E ET Y (u) < 1] 


by replacing T, S (0) and Y (0) by T — t, x and y respectively in the formula for v(0, S (0), Y (0)). 
Now start at time O at S (0) and Y (0). Using the Markov property, we can show that the stochastic 


process 
ult, S(O), Y (0) 


is a martingale under IP. We compute 


d [e=rto(t, 8, Y(0)] 


Saal rut v +r SUs +rY vy, 4 103S vse + por1025Y vey + L07Y?0yy) a 


+ pa25uz dB, +4/1— p? O28 Uz dBa + oY oydB; 
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VEXALD=O:xS=:0 


v(t, 0, 0) =0 


Figure 24.1: Boundary conditions for barrier option. Note that t € [0, T] is fixed. 


Setting the dt term equal to 0, we obtain the PDE 


TU + Ut +] PEVe TE PYVy + 509% Vee 
+ pO102 YU gy + Loy Vyy = 0, 
O<t<T, 2>0, 0<y<L£. 


The terminal condition is 


v(T,a,y) =le=K)", 2>0,0<y<L, 


and the boundary conditions are 


v(t,0,0)=0, 0<t<T, 
v(t,z,L)=0, 0<t<T, 420, 
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12,2 12,2 
ru + ve + ryvy + 3071Y Vyy = 0 rU + UF TEV, + 302% Ure = 0 


This is the usual Black-Scholes formula 
in gz. 


This is the usual Black-Scholes formula 
in y. 


The boundary condition is 

v(t,0,0) = e-"7-9(0 — K)+ = 0; 
the terminal condition is 
wT 2, 0pS(e— KR), 220 


The boundary conditions are 

v(t, 0, £) = 0, v(t, 0,0) = 0; 

the terminal condition is 
v(T,0,y)=(0—- K)*=0, y>0. 


On the y = 0 boundary, the barrier is ir- 
relevant, and the option value is given by 
the usual Black-Scholes formula for a Eu- 
ropean call. 


On the x = 0 boundary, the option value 
isv(t,0,y)=0, 0<y<L. 


24.3 The hedge 


After setting the dt term to 0, we have the equation 
d[ev(t, S(t), Y0) 


Ze" [poz5vs dBy + 4/1- p? 0250, db, + aYvdñı| ' 


where 0, = velt, S(t), Y (0), vy = vy(t, S(t), Y(t), and By, Ba, S, Y are functions of t. Note 
that 


d [e= S(t)] = e™ [-r8(t) dt + d5(0)] 
ze” [p050 dB, (t) +4/1-— p? 025(t) dB, (| . 
d [Y (t)] =e" [-rY (t) dt + dY (0) 
= e~"'a1Y (t) dBi (t). 
Therefore, 


d [e™"to(t, S(t),Y(0)] = ved[e7"*5] + vyd[e="*Y]. 


Let A2(t) denote the number of shares of stock held at time t, and let A (t) denote the number of 
“shares” of the barrier process Y. The value X (t) of the portfolio has the differential 
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This is equivalent to 
d[e~"'X (t)] = Ao(t)d[e~S (t)] + Ai(t)d[e7"Y (6). 
To get X (t) = v(t, S(t), Y (£)) for all t, we must have 
X (0) = v(0, S(0), Y (0)) 


and 


Chapter 25 


American Options 


This and the following chapters form part of the course Stochastic Differential Equations for Fi- 
nance II. 


25.1 Preview of perpetual American put 


dS = rS dt+05 dB 


Intrinsic value at time t : (K — S(t))T. 
Let L € [0, K] be given. Suppose we exercise the first time the stock price is L or lower. We define 
ry, = min{t > 0; S(t) < L}, 
vp(z) = He"! (K — S(rr))* 
K-2 ifz< L, 
ie — L)Ee if > L. 


The plan is to comute vz (x) and then maximize over L to find the optimal exercise price. We need 
to know the distribution of 77,. 


25.2 First passage times for Brownian motion: first method 


(Based on the reflection principle) 


Let B be a Brownian motion under JP, let + > 0 be given, and define 
r= min{t > 0; B(t) =z}. 
T is called the first passage time to x. We compute the distribution of 7. 
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A Intrinsic value 


K Stock price x 


Figure 25.1: Intrinsic value of perpetual American put 


Define 


M(t) = e Blu). 


From the first section of Chapter 20 we have 
2(2m — b 2m — by? 
IP{M(t) € dm, B(t) € db} = o A dm db, m>0,b<m. 


Therefore, 


Now 
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SO 
o 


a 
= ¿PAM > e} dt 


ð A 2 z? PE 
= |= —— exp { -— p> dz 
Ot avi Vv 27 p 


We also have the Laplace transform formula 
Feo = f e7% IP{r € dt} 
0 
= cava a > 0. (See Homework) 


Reference: Karatzas and Shreve, Brownian Motion and Stochastic Calculus, pp 95-96. 


25.3 Drift adjustment 


Reference: Karatzas/Shreve, Brownian motion and Stochastic Calculus, pp 196-197. 


For 0 < t < oo, define 


B(t) = 6t + B(t), 


Define 
7 = min{t > 0; B(t) = z}. 


We fix a finite time 7 and change the probability measure “only up to T”. More specifically, with 
T fixed, define 


P(A) = f Z(T) dP, A€ F(T). 
A 
Under P, the process B (t), 0 < t < T, is a (nondrifted) Brownian motion, so 


PF € dt} = P{7 € dt} 


Z AF, 0<t<T 
= ——— exp 4 -— ; <T: 
vana E 
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For 0 < t < T we have 


P{7 <t} = E 


L<y exp{0B(T) - 1T} 


| 
| 
| 
=E Lt geen E lexp(0B(7) — Ler} FG A n|] 
= E [Veen exp{OB(F At) — 1015 At))] 
| 


lI 
om 
Va] 
Nm 
= 
“mn 

(a>) 

a 

yo] 
m 
| 
E 
rol | 
lio 
2 
w 
=—— 
a 


Therefore, 


(x — 6t)? 


x 
IP4T € dt} = —— exp 4 — dt, O<t<T. 
t += am P| 21 = 


Since T is arbitrary, this must in fact be the correct formula for all t > 0. 


25.4 Drift-adjusted Laplace transform 


Recall the Laplace transform formula for 
r = min{t > 0; B(t) =z} 


for nondrifted Brownian motion: 


o g Pe. Ju 
Ee = exp < -at — — pdt = e? a>Q0,z2>0 
o ty2rt pd = 
For 
7 =min{t > 0;6t+ Bit) = 2), 
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the Laplace transform is 


-a7 S g (x — 6t)? 
Fe = f exp < —at — ————~— > dt 
o tV2zxt pd 2t ) 


a 2 t Æ des dt 
= ——— ex Q —+2 5 
oe ARE > 2 


RERE 2 
= EEE. ao, 


where in the last step we have used the formula for Fe” “7 with q: replaced by a + $0", 


If 7(w) < oo, then 
lim e707) = 1; 
oo 

w) 


if 7(w) = 00, then e~°7“) = 0 for every a > 0, so 


lim e77) = q, 
aļo 


Therefore, 
lim e727() 


oo 


= Izco . 


Letting aļ0 and using the Monotone Convergence Theorem in the Laplace transform formula 


eo = e7? ZEV Qa+6? 


? 


we obtain 
IPF < 00) = 0 Ve = er 9-2/4 | 
If 9 > 0, then 
IPP{7 oh Ly 
If 6 < 0, then 


Pit < oœ} ee <1. 


(Recall that « > 0). 


25.5 First passage times: Second method 


(Based on martingales) 


Let o > 0 be given. Then 
Y (t) = exp{oB(t) — 40°t} 
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is a martingale, so Y (t A 7) is also a martingale. We have 
1=Y(0A7) 
= EY(tAT) 
= Eexp{oB(t Ar) — h0*(tA7)). 
vit 172 
= Jim Eexp{aB(t AT) — 30 (t AT)}. 
We want to take the limit inside the expectation. Since 
0 < exp{oB(t Ar) — $07(tAT)} < e, 
this is justified by the Bounded Convergence Theorem. Therefore, 
2 : He? 
t= E lim exp{oB(t Ar) — 30 (tAT)} 
There are two possibilities. For those w for which T (w) < oo, 
1 
Lim exploB(t AT) — Lo0*(tA 7)) = ¿ren 
For those w for which 7(w) = œ, 
im exp{oB(t AT) — 40°(tAT)} < im exp{ox — ł0°t} = 0. 
Therefore, 


= : O 
1= Æ lim expioB(t A7) 5o (tAT)} 
1 
= JE ATO A 


lee 
a 2° of 


Oxr-— 1 o?r : 
where we understand e 2 to be zero if T = œ. 


2 


Let a = 407,80 o = y2a. We have again derived the Laplace transform formula 


2 
e *V20 — Fe°", a>O0,2>0, 


for the first passage time for nondrifted Brownian motion. 


25.6 Perpetual American put 


dS = rS dt+ 05 dB 
S(0)= 7z 
S(t) = vexp{(r — $07)t +0 B(t)} 
o 


7 
= ———]t+B(t 
xr exp $ 0 (- z) + Bit) 
—_$ 
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Intrinsic value of the put at time t: (K — S(t))T. 
Let L € [0, K] be given. Define for x > L, 


T= mint > 0; S(t) = L} 
1 L 
= min{t > 0; 0t + B(t) = — log —) 
o °g 


1 £ 
= min{t > 0; —6t — B(t) = — log — 
min{t > 0; (t) = = log >} 
Define 


vp = (K — L) Ee"! 
6 1 
=(K- L) exp (log ee — log = Jar + P} 
o L o L 


E FP 
SAKI) (=) ais 


We compute the exponent 


l 

l 

| 

l 
N 
= 
D 

nN 
[l 

bole 


l 
| 
Q| = 
| 
T 
bole 
| 
alļl= ale alļl= al= al= 
E E 
mm bo 
+ 
3 q|3 
+ 
9, 
A 
Aa 


a 1 r 2 f 
AS A 
r j r 
o 
2r 
~ z 
Therefore, 
is (Kk — 2), 0<x<L, 
vrz) = 
i K-D), cz 


The curves (K — L) di , are all of the form C272/0. 


We want to choose the largest possible constant. The constant is 


CS (K -DLP , 
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2 
(K - L) LJ?” S 


K Stock price x 


Figure 25.2: Value of perpetual American put 


value 
=> 


Stock price x 


Figure 25.3: Curves. 
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and 
OC 2r 2r 1 
— = -Lo — (K — L) Lo? 
ðL i z2 ) 
2r 2r 1 
= LP |-14+ —(k —- L)> 
[14 SUK - oz 
2r: 2r 2r K 
= Lo — |1 — —— 
| ( +3) Ee] 
We solve 
(+5) ae, 
o? aL 
to get 
ork 
opr 
Since 0 < 2r < a? + 2r, we have 
0< L< K. 


Solution to the perpetual American put pricing problem (see Fig. 25.4): 


(x) (Kk — 2), O0<2e<L', 
v(z) = 
Rere ee ST, 
where . 
x 2K 
opr 
Note that 
paje —1, 0O<x<ES, 
e - 2 (K _ Ly Ce ek a > L*. 
We have 
1 
lim v'(e) = -25(K-19= 
e L* a L* 
2h r (x 2rK ) o? +2r 
= o? o? + 2r 2rK 
9 r fo*+2r—2r\ o% + 2r 
o? 024 2r 2r 
=-1 


| 

3 
S. 
— 

8 
— 
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(K - L*)(x/L") 


L K Stock price x 
Figure 25.4: Solution to perpetual American put. 


25.7 Value of the perpetual American put 


Set E 
2r 2rK 


* Y h 
= — L* = —— = — Kk. 
om gee 024+42r y+1 


HO < x < L*, then v(z) = K — z. If L* < x < oo, then 


v(x) = KELE a” 
C 


SBP le Rr l 
where 
S(0)= 7z 
r= min{t > 0; S(t) = E*}. 
If0 < x < L*, then 


x -27/02 


—rv(z) + rev (2) + $0%270"(2) = -r(K — x) +re(-1) = 
If L* < zx < oo, then 


—rv(x) + rav (x) + $0727 0" (2) 
2 


In other words, v solves the linear complementarity problem: (See Fig. 25.5). 


(7.1) 


(7.2) 


(7.3) 
(7.4) 
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K E 


Figure 25.5: Linear complementarity 


For all z € R, x 4 L*, 


rv — rev — 0 ga" >0, (a) 
v> (K-22), (b) 
One of the inequalities (a) or (b) is an equality. (c) 


The half-line [0, 00) is divided into two regions: 


C = (e; v(e) > (K - 2)*}, 
S = (e; rv—rav' — $0727v" > 0), 
and £* is the boundary between them. If the stock price is in C, the owner of the put should not 


exercise (should “continue’’). If the stock price is in © or at L*, the owner of the put should exercise 
(should “stop”. 


25.8 Hedging the put 


Let (0) be given. Sell the put at time zero for v(S(0)). Invest the money, holding A (t) shares of 
stock and consuming at rate C(t) at time t. The value X (t) of this portfolio is governed by 


dX (t) = A(t) dS(t) + r(X(t) — A(t)S(0) dt — C(t) de, 
or equivalently, 


d(e X (t)) = -e7" C(t) dt +e" A(t)a S(t) dB(t). 
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The discounted value of the put satisfies 


d (e"'v(S(t))) = e [-rv( S(t) rS SE) + 4075" (e)v"(S()] at 
+e"'o§(t)v'(S(t)) dB(t) 
= —r Ke" lisac gdt + eo S(t)v'(S(t)) dB(t). 

We should set 

C(t) = rKlis(9<try» 

A(t) =v'(S(1)). 
Remark 25.1 If S(t) < £*, then 

(SH) =K — S(t), A(t) =v'(S(t)) =-1. 


To hedge the put when S(t) < L*, short one share of stock and hold K in the money market. As 
long as the owner does not exercise, you can consume the interest from the money market position, 
i.e., 

C(t) = rh Asc} 


Properties of e~"'v(S(t)): 
1. eto(S(t)) is a supermartingale (see its differential above). 
2. e™*u(S (t) ELA) A 0<t< œ; 
3. e™"*v(S(t)) is the smallest process with properties 1 and 2. 
Explanation of property 3. Let Y be a supermartingale satisfying 
Y(t) > e™(K - S(t))t, 0<t<oo. (8.1) 
Then property 3 says that 
Y(t) > e"v(S(t)), 0<t<oo. (8.2) 
We use (8.1) to prove (8.2) for t = 0, i.e., 
Y (0) > v(S(0)). 8.3) 


If t is not zero, we can take £ to be the initial time and S(t) to be the initial stock price, and then 
adapt the argument below to prove property (8.2). 
Proof of (8.3), assuming Y is a supermartingale satisfying (8.1): 


Case I: S(0) < L*. We have 


¥(0) > (K -= $(0))* = v(8(0)). 
(8.1) 
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Case II: S(0) > L*: For T > 0, we have 


Y (0) > EY (7 AT) (Stopped supermartingale is a supermartingale) 
> B[Y(r AT)1¢-<o0}]- (Since Y > 0) 


Now let T'—00 to get 


¥(0) > lim JE PAD! 
> JE ¡sal (Fatou’s Lemma) 
> IE je" (K -S(r))T1 e003] (by 8.1) 
Na 


L* 
= v(S(0)). (See eq. 7.2) 


25.9 Perpetual American contingent claim 


Intinsic value: h(.S(t)). 
Value of the American contingent claim: 

v(2) = sup Æ” [e "A(S(7))], 
where the supremum is over all stopping times. 


Optimal exercise rule: Any stopping time 7 which attains the supremum. 


Characterization of v: 


1. eto(S(t)) is a supermartingale; 
Le “"ol(S()) > e™*h(S(t)), 0 <t<o; 


3. e"v(S(t)) is the smallest process with properties 1 and 2. 


25.10 Perpetual American call 


v(a) = sup B® [e7 (S(r)  K)*] 


Theorem 10.63 


v(1)=x Va>0. 
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Proof: For every t, 


Let t—>00 to get v(x) > z. 
Now start with S(0) = z and define 


Os S(t). 
Then: 


1. Y is a supermartingale (in fact, Y is a martingale); 
2. A O O 0<t< œ. 


Therefore, Y (0) > v(S(0)), i.e., 
x > v(z). 


Remark 25.2 No matter what 7 we choose, 
JE? [e7 (S(r) — K)*] < E” [e S(r)] < 2 = v(2). 


There is no optimal exercise time. 


25.11 Put with expiration 


Expiration time: T > 0. 
Intrinsic value: (K — S(t))*. 
Value of the put: 
v(t, x) = (value of the put at time t if S(t) = x) 


= sup E%e"-9(K — S(r))t. 
t<r<T 
o — 


7 :stopping time 
See Fig. 25.6. It can be shown that v, v;, Vy are continuous across the boundary, while v,.,, has a 
jump. 


Let 5(0) be given. Then 
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wT, 2) = 0, z> K 


oT 2) =K=w, 0a ok 


Figure 25.6: Value of put with expiration 


1. e v(t, S(t)), 0 < t < T, is a supermartingale; 
DUES) E PK - S), OR E57; 


3. e="*v(t, S(t)) is the smallest process with properties 1 and 2. 


25.12 American contingent claim with expiration 


Expiration time: T > 0. 
Intrinsic value: A(S (t)). 


Value of the contingent claim: 


v(t, z) = po E*e"C-9h(S(r)). 


Then 


rU — Vy — FBV, — $072 Ure > 0, 


v > h(x), 
At every point (t, x) € [0, T] x [0, 00), either (a) or (b) is an equality. 


Characterization of v: Let S (0) be given. Then 


261 


(a) 
(b) 
(c) 
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1. e™"*v(t, S(t)), 0<t < T, is a supermartingale; 
2. e "u(t, S(t)) > eT A(S(E)); 


3. e~ "v(t, S(t)) is the smallest process with properties 1 and 2. 


The optimal exercise time is 
T= min{t > 0; v(t, S(t)) = A(S (t))} 


If 7(w) = oo, then there is no optimal exercise time along the particular path w. 


Chapter 26 


Options on dividend-paying stocks 


26.1 American option with convex payoff function 


Theorem 1.64 Consider the stock price process 
dS(t) = r(t)S(t) dt + o(t)S(t) dB(t), 


where r and o are processes and r(t) > 0, 0<t < T,a.s. This stock pays no dividends. 
Let h(x) be a convex function of x > 0, and assume h(0) = 0. (E.g., h(x) = (x — K)*). An 
American contingent claim paying h(S(t)) if exercised at time t does not need to be exercised 
before expiration, i.e., waiting until expiration to decide whether to exercise entails no loss of value. 


Proof: For 0 < a < 1 and x > 0, we have 


h(az) = h((1 — a)0 + az) 
< (1- ajh(0) + ah(z) 
= ah(z). 


Let T be the time of expiration of the contingent claim. For 0 < t < T, 


os =ef- f ro tub <1 


and S(T) > 0, so 


B(t) bE) + 
h (Sam) < gyt S D) (*) 


Consider a European contingent claim paying h(S(T')) at time T. The value of this claim at time 
t € [0, T]is 


¿san pro] 
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Figure 26.1: Convex payoff function 


Therefore, 


05m) [Fea] ee 
> ——h (50 IE sol) (Jensen’s inequality) 
) 


is a martingale) 


This shows that the value X (t) of the European contingent claim dominates the intrinsic value 
h(S(t)) of the American claim. In fact, except in degenerate cases, the inequality 


X(t)>h(S(), 0<t<T, 


is strict, i.e., the American claim should not be exercised prior to expiration. E 


26.2 Dividend paying stock 


Let r and o be constant, let ô be a “dividend coefficient” satisfying 


0<d<1. 
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Let T > 0 be an expiration time, and let tı € (0,7) be the time of dividend payment. The stock 
price is given by 


S(t) = fae ss ean eens 0<t<h, 


(1 — 6) S(t1) exp{(r — 202) (t —t,) + 0(B(t) - B(ti))}, t <t<T. 


Consider an American call on this stock. At times t € (t1, T), it is not optimal to exercise, so the 
value of the call is given by the usual Black-Scholes formula 


v(t,2) =aN(d4(T —t,2)) - Ke" O-ON(d_(T-t 0), ti <t<T, 


where 
di(T — t, £) 


log = +H(T-0(r+0?/2)|. 


1 
ON T—t 


At time tı, immediately after payment of the dividend, the value of the call is 
v(t1, (1 — 9)5(t1)). 
At time tı, immediately before payment of the dividend, the value of the call is 
w(t, S(t), 


where 
w(t,, £) = max { (2 — K)", v(4,(1-4)z}. 


Theorem 2.65 For 0 < t < tı, the value of the American call is w(t, S(t)), where 
w(t, 2) =JE** [e -w(t S(t1)) ] - 


This function satisfies the usual Black-Scholes equation 


TW Wie HTW 4 Lo?’ t’ Wee = 0, 0<t<t, 2 >0, 


(where w = w(t, x)) with terminal condition 
w(t,, £) = max { (x — K)*, v(t1,(1—-4)z)}, 2 >0, 


and boundary condition 
w(t,0)=0, 0O<t<T. 


The hedging portfolio is 
t SH), VRS ie 
ag = fee 50) 
Vales (0), t <t<T. 


Proof: We only need to show that an American contingent claim with payoff w(t,,S(t,)) at time 
tı need not be exercised before time tı. According to Theorem 1.64, it suffices to prove 


1. w(t,,0) = 0, 
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2. w(t,, £) is convex in z. 


Since v(t,,0) = 0, we have immediately that 
w(t1,0) = max [(0— K)*, v(t1, (1 — 8)0)} =0. 


To prove that w(t1, x) is convex in x, we need to show that v (t1, (1—9)x) is convex is x. Obviously, 
(x — K)* is convex in x, and the maximum of two convex functions is convex. The proof of the 
convexity of v(tı, (1 — 9)x) in z is left as a homework problem. a 


26.3 Hedging at time ¢, 


Let x = S(t). 

Case I: v(t1, (1 — 5)x) > (x — K)?. 

The option need not be exercised at time ¢; (should not be exercised if the inequality is strict). We 
have 


where 
A(t1+) = lim A(t) 
it, 
is the number of shares of stock held by the hedge immediately after payment of the dividend. The 
post-dividend position can be achieved by reinvesting in stock the dividends received on the stock 
held in the hedge. Indeed, 


A(ti+) = Ath) SAC Ans 


1-6 
a SA(t1)5(t1) 
ATENEO 


dividend ived 
= # of shares held when dividend is paid + Z o A 
price per share when dividend is reinvested 


Case II: v(t;, (1 — 9)2) < (2 — K)?. 

The owner of the option should exercise before the dividend payment at time tı and receive (x— K). 
The hedge has been constructed so the seller of the option has x — K before the dividend payment 
at time tı. If the option is not exercised, its value drops from + — K to v(t,, (1 —6)2), and the seller 
of the option can pocket the difference and continue the hedge. 


Chapter 27 


Bonds, forward contracts and futures 


Let {W (t), F(t); 0 < t < T} be a Brownian motion (Wiener process) on some ((2, F, P). Con- 
sider an asset, which we call a stock, whose price satisfies 


dS(t) = r(t)S(t) dt + o(t)S(t) dwt). 


Here, r and o are adapted processes, and we have already switched to the risk-neutral measure, 
which we call JP. Assume that every martingale under /P can be represented as an integral with 
respect to W. 


Define the accumulation factor 


B(t) = exp { f r) du}. 


A zero-coupon bond, maturing at time 7’, pays 1 at time T and nothing before time T. According 
to the risk-neutral pricing formula, its value at time t € [0, T] is 


BCT) = Bt) E Eada 


Given B(t, T) dollars at time t, one can construct a portfolio of investment in the stock and money 
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market so that the portfolio value at time 7 is 1 almost surely. Indeed, for some process y, 


BOT) = 80 E | 700] 


martingale 


= a) [E (say) + [oe awo] 
= stv) [BOT + f ve) awe), 


dB(, T) =" 0950 [BO T) + f r aw] ae + 8010 awa) 
= r(t) B(t,T) dt + B(t)y(t) dW (t). 


The value of a portfolio satisfies 


dX (t) 


A(t) dS(t) + r(t)LX (t) — A(0)S(t) ]dt 


=) 


We set 


If, at any time t, X (t) = B(t, T) and we use the portfolio A (u), t < u < T, then we will have 
XOJ=BT ys, 


If r(t) is nonrandom for all t, then 


B(t,T) = exp {- f ro da] A 


dB(t,T) = r(t)B(t, T) dt, 


i.e., y = 0. Then A given above is zero. If, at time t, you are given B(t, T) dollars and you always 
invest only in the money market, then at time 7’ you will have 


B(t, T) exp tf da] ar 


If r(t) is random for all £, then y is not zero. One generally has three different instruments: the 
stock, the money market, and the zero coupon bond. Any two of them are sufficient for hedging, 
and the two which are most convenient can depend on the instrument being hedged. 
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27.1 Forward contracts 


We continue with the set-up for zero-coupon bonds. The T'-forward price of the stock at time 
t € [0,7] is the F(t)-measurable price, agreed upon at time t, for purchase of a share of stock at 
time 7’, chosen so the forward contract has value zero at time t. In other words, 


1 
E En (S(T) — F(t)) Fo) ELE 
We solve for F(t): 
1 
0= E En (S(T) — F@) Fo] 
ER) FO [BO 
Aro) - e lan 
S(t) FO 
Bee) — pO OD 
This implies that 
__ Ss 
= Baa 


Remark 27.1 (Value vs. Forward price) The 7'-forward price F(t) is not the value at time £ of 
the forward contract. The value of the contract at time t is zero. F(t) is the price agreed upon at 
time t which will be paid for the stock at time T’. 


27.2 Hedging a forward contract 


Enter a forward contract at time 0, i.e., agree to pay F(0) = ss for a share of stock at time T. 
At time zero, this contract has value 0. At later times, however, it does not. In fact, its value at time 
t € [0, T]is 


vO = 3) E | (50) -roro 
= 0) E [Gro] - ro e [Eio] 
ae 
BO Fy - FOBT) 


This suggests the following hedge of a short position in the forward contract. At time 0, short F (0) 
T -maturity zero-coupon bonds. This generates income 


F(0)B(0,T) = 


B(0,T) = S(0). 
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Buy one share of stock. This portfolio requires no initial investment. Maintain this position until 
time 7’, when the portfolio is worth 


S(T) — F(0)B(T,T) = S(T) — F(0). 


Deliver the share of stock and receive payment F(0). 


A short position in the forward could also be hedged using the stock and money market, but the 
implementation of this hedge would require a term-structure model. 


27.3 Future contracts 


Future contracts are designed to remove the risk of default inherent in forward contracts. Through 
the device of marking to market, the value of the future contract is maintained at zero at all times. 
Thus, either party can close out his/her position at any time. 


Let us first consider the situation with discrete trading dates 
O=to <i <...<t, =T. 


On each [t;,t;+1), 7 is constant, so 


is F (t,,)-measurable. 


Enter a future contract at time tz, taking the long position, when the future price is ®(¢;,). At time 
tk+1, When the future price is ®(t,41), you receive a payment ®(tz+1) — ®(t,). (If the price has 
fallen, you make the payment —(®(t,41) — ®(t,)). ) The mechanism for receiving and making 
these payments is the margin account held by the broker. 


By time T = t,,, you have received the sequence of payments 


Dltr+1) — Dltr), Ptet2) — P(t), >>, Dltn) — O(tn-1) 
at times te41,€x42, + + - , En. The value at time t = to of this sequence is 
n—1 


p(t) E 5 Clt) =) zo 


T blt 


Because it costs nothing to enter the future contract at time ¢, this expression must be zero almost 
surely. 
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The continuous-time version of this condition is 


sw | [dow rw) =0, 0<t<T. 


Note that 3(t;41) appearing in the discrete-time version is F (t¡)-measurable, as it should be when 
approximating a stochastic integral. 


Definition 27.1 The T-future price of the stock is any F (t)-adapted stochastic process 
P(t); 0<t<T}, 
satisfying 


(T) = S(T) a.s., and (a) 


Toy 
el za 


Theorem 3.66 The unique process satisfying (a) and (b) is 


70 =0, 0<t<T. (b) 


dt) = E [5 0) 70) Rie are 


Proof: We first show that (b) holds if and only if ® is a martingale. If ® is a martingale, then 
i ata dd(u) is also a martingale, so 


ra dela 


a F| ss / a 


(u) 


i) -E if 7 d®(u) 


= 0. 


On the other hand, if (b) holds, then the martingale 


Toy 
M(t) =E / zg 12) rw) 
satisfies 
E f roy 
MW = [ gg HH +B f zg 2O rw) 
a Eo (u), 0<t<T 
this implies 
dM (1) = 50 do(0, 
10(t) = B(t) dM), 
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and so ® is a martingale (its differential has no dt term). 


Now define 
O(t)= E bojo] y, O<t<T. 


Clearly (a) is satisfied. By the tower property, ® is a martingale, so (b) is also satisfied. Indeed, this 
® is the only martingale satisfying (a). a 


27.4 Cash flow from a future contract 


With a forward contract, entered at time 0, the buyer agrees to pay F (0) for an asset valued at S(T). 
The only payment is at time 7’. 


With a future contract, entered at time O, the buyer receives a cash flow (which may at times be 
negative) between times O and T. If he still holds the contract at time 7’, then he pays S(T) at time 
T for an asset valued at S(T). The cash flow received between times 0 and 7’ sums to 


T 
f d&(u) = (T) — (0) = S(T) — (0). 
0 
Thus, if the future contract holder takes delivery at time 7”, he has paid a total of 
(P(0) — S(T)) + S(T) = 8(0) 


for an asset valued at S(T). 


27.5 Forward-future spread 


Future price: P(t) = IE |s) 70). 


Forward price: 


If aD and S(T) are uncorrelated, 
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If aT and S(T) are positively correlated, then 
(0) < F(0). 


This is the case that a rise in stock price tends to occur with a fall in the interest rate. The owner 
of the future tends to receive income when the stock price rises, but invests it at a declining interest 
rate. If the stock price falls, the owner usually must make payments on the future contract. He 
withdraws from the money market to do this just as the interest rate rises. In short, the long position 
in the future is hurt by positive correlation between a and S(T). The buyer of the future is 
compensated by a reduction of the future price below the forward price. 


27.6 Backwardation and contango 


Suppose 
dS (t) = uS (t) dt + a S(t) dW (t). 


Define 9 = 4, W(t) =0+W( 


Z(T) = exp{-OW (T) - 40°T} 
P(A) = f Z(T) dP, YA € F(T). 
A 
Then W is a Brownian motion under P, and 
dS (t) = rS (t) dt + oS (t) dW (8). 


We have 


The expected future spot price of the stock under IP is 


IES(T) = s(0)e IE [exp (20? + ow(T)}] 
= eS (0). 
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The future price at time 0 is 
(0) =e"? S(0). 


If u > r, then ®(0) < S(T). This situation is called normal backwardation (see Hull). If y < r, 
then 6(0) > JES(T). This is called contango. 


Chapter 28 


Term-structure models 


Throughout this discussion, {W (t); 0 < ¢ < T*} is a Brownian motion on some probability space 
(Q, F,P), and {F(t); 0 < t < T*) is the filtration generated by W. 


Suppose we are given an adapted interest rate process {r (t); 0 < t < T*). We define the accumu- 
lation factor 


BA =exp{ [rta du}, 0<t< T”. 


In a term-structure model, we take the zero-coupon bonds (“zeroes”) of various maturities to be the 
primitive assets. We assume these bonds are default-free and pay $1 at maturity. For 0 < t < T < 
T*, let 

B(t, T) = price at time t of the zero-coupon bond paying $1 at time T. 


Theorem 0.67 (Fundamental Theorem of Asset Pricing) A term structure model is free of arbi- 


trage if and only if there is a probability measure IP on Q (a risk-neutral measure) with the same 
probability-zero sets as IP (i.e., equivalent to IP), such that for each T € (0, 7%], the process 


Se MEAR 


is a martingale under P. 
Remark 28.1 We shall always have 
dB(t, T) = u(t, T) Bt, T) dt+ p(t, T)B(t,T) dW(t), O<t<T, 


for some functions u(t, T) and p(t, T). Therefore 


B(t,T)\ _ El A 
Eo) 20D 4 (57) + ay 26D 
B(t,T) 


= u, T) = r0 ZUT) 


p(t) 


dt+p(t,T) dW (t), 
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so JP is a risk-neutral measure if and only if u(t, T), the mean rate of return of B(t, T) under IP, is 
the interest rate r (t). If the mean rate of return of B(t, T) under JP is not r(t) at each time t and for 


each maturity T, we should change to a measure IP under which the mean rate of return is r(t). If 
such a measure does not exist, then the model admits an arbitrage by trading in zero-coupon bonds. 


28.1 Computing arbitrage-free bond prices: first method 


Begin with a stochastic differential equation (SDE) 
dX (t) = a(t, X (t)) dt + b(t, X (t)) dW (t). 


The solution X (t) is the factor. If we want to have n-factors, we let W be an n-dimensional 
Brownian motion and let X be an n-dimensional process. We let the interest rate r(t) be a function 
of X (t). In the usual one-factor models, we take r(t) to be X (t) (e.g., Cox-Ingersoll-Ross, Hull- 
White). 


Now that we have an interest rate process {r(t); 0 < t < T*}, we define the zero-coupon bond 
prices to be 


We showed in Chapter 27 that 
dB(t,T) = r(t)B(t, 7) dt + B(t)y(t) dW (t) 


for some process y. Since B(t, T) has mean rate of return r(t) under JP, IP is a risk-neutral measure 
and there is no arbitrage. 


28.2 Some interest-rate dependent assets 


Coupon-paying bond: Payments P,, P,...,F,, at times T1, 7>,... , Tn. Price at time t is 


Y PB(t, Th). 


{k:t<Ty, } 


Call option on a zero-coupon bond: Bond matures at time T. Option expires at time 7, < T. 
Price at time ¢ is 


(B(T,,T) — K)t 


Fol, 0O<t<T,. 
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28.3 Terminology 


Definition 28.1 (Term-structure model) Any mathematical model which determines, at least the- 
oretically, the stochastic processes 


B(t,T), 0<t<T, 
for all T € (0, 7%]. 


Definition 28.2 (Yield to maturity) For 0 < t < T < T*, the yield to maturity Y (t, T) is the 
F (t)-measurable random-variable satisfying 


B(t,T) exp {(T — HY (t, T)} = 1, 


or equivalently, 


Determining 


is equivalent to determining 


28.4 Forward rate agreement 


Let0<t<T <T +e < T* be given. Suppose you want to borrow $1 at time 7’ with repayment 
(plus interest) at time T + e, at an interest rate agreed upon at time ¢. To synthesize a forward-rate 
agreement to do this, at time t buy a T'-maturity zero and short DFD (T + e) -maturity zeroes. 


The value of this portfolio at time t is 


B(t, T) 


BUG Es BOT +6 


Bt, T +6) =0. 


At time T, you receive $1 from the T'-maturity zero. At time T + e, you pay $ a The 


effective interest rate on the dollar you receive at time T is R(t, T, T + e) given by 


Bit, T) 
a tT, T 
BET y = OPC RET +O), 
or equivalently, 
log B(t, T — log B(t, T 
RUT. reo = EB P+9 — log BUT) 
€ 
The forward rate is 
fT) =lim R(t,T,T + 9 == log B(t, 1). (4.1) 
Ey O 
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This is the instantaneous interest rate, agreed upon at time £, for money borrowed at time T. 


Integrating the above equation, we obtain 


T T @ 
f f(t, u) du = -f gq os Bw) du 
u=T 
= — log B(t, u) 


u=t 


— log B(t, T), 


so 


B(t,T) = exp {= [10,0 da). 


You can agree at time ¢ to receive interest rate f(t, u) at each time u € [t, T]. If you invest $ B(t, T) 
at time ¢ and receive interest rate f(t, u) at each time u between t and T, this will grow to 


B(t,T) eof [10,0 da] =1 


at time T. 


28.5 Recovering the interest r(t) from the forward rate 


BUT) _=E olro] = r(t) 
On the other hand, 
T 
B(t,T) = sl- fit u) da] ; 
BT) ES) 0x0 Pre da] 
a 
20D), =-46 


Conclusion: r(t) = f (t,t). 
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28.6 Computing arbitrage-free bond prices: Heath-Jarrow-Morton 
method 


For each T € (0, T'*], let the forward rate be given by 


10,1) = 10,0)+ f alu T) aut f out) dW(u), 0<t<T. 


Here {a(u, T); 0 < u < T} and {o(u, T); 0 < u < T} are adapted processes. 
In other words, 
df(t,T) = a(t,T) dt + o(t,T) dW (t). 


DUE = eol- f fat u) de). 


Recall that 


T 
=i) a- f [a(t, u) dt + o (t, u) dW (t)] du 


A aie IES a dt — [owe J dW (t) 


a*(t,T) o*(t,T) 
= r(t) dt — a*(t,T) dt — o*(t, T) dW(t). 


Let 
g(x) =e", g'(x) =e", g (e) =e” 
Then 
E 
B(t,T)=3 (-/ flt,u) in) , 
and 


=g (- f 1 r dt — o* dt — o* dW) 
+ 


s" (- i 0) du) (0 
30 (- [se 
t 
1 ( 
3 ( 


= B(t,T) [© - a(t, T) + 
—o*(t,T)B(t,T) dW (t). 


arr 
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28.7 Checking for absence of arbitrage 


JP is arisk-neutral measure if and only if 


T T 5 
f a(t,u) du = 3 f olt u) du] , O<t<T<T*. 
t t 


Differentiating this w.r.t. T, we obtain 


T 
at, T) = ot, T) f (4) du, 0<t<T<T*. 
t 


Not only does (7.1) imply (7.2), (7.2) also implies (7.1). This will be a homework problem. 


(7.1) 


(7.2) 


Suppose (7.1) does not hold. Then /P is not a risk-neutral measure, but there might still be a risk- 


neutral measure. Let (0(t); 0 < t < T*} be an adapted process, and define 


Then 

dB(t,T) = B(t, T) |r(t) — a*(t, T) + Lo (t, 1)? dt 
)B(t,T) dW(t) 
r(t) — a” (t, T) + 5(0*(t, T))?+ o*(t, T)0(t)| dt 
—o*(t,T)B(t,T) dW(t), 0<t<T. 


In order for B(t, T) to have mean rate of return r(t) under IP, we must have 
aX (t, T) = (t T)? + o7(t, T(t), O<t<T<T*. 
Differentiation w.r.t. T yields the equivalent condition 


a(t,T) =o(t,T)o*(t,T) + o(t, T)Ot), 0<t<T<T". 


(7.3) 


(7.4) 


Theorem 7.68 (Heath-Jarrow-Morton) For each T € (0,T*], let a(u,T), 0 < u < T, and 
o(u,T),0 < u < T, be adapted processes, and assume o(u,T) > 0 for all u and T. Let 


F(0,T), 0 < t < T*, be a deterministic function, and define 


ALT) = 10,7) + fa(u,7) du+ f ofu T) dW (u). 
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Then f(t,T), 0 <t < T < T* isa family of forward rate processes for a term-structure model 
without arbitrage if and only if there is an adapted process 0(t), 0 < t < T*, satisfying (7.3), or 
equivalently, satisfying (7.4). 


Remark 28.2 Under /P, the zero-coupon bond with maturity T has mean rate of return 
r(t) - a” (t, T) + 3(0*(t, T)” 
and volatility o*(t, T). The excess mean rate of return, above the interest rate, is 
—oa*(t,T)+ Sort, Py, 
and when normalized by the volatility, this becomes the market price of risk 


—a* (t, T) + 30° T))’ 
o*(t, T) 


The no-arbitrage condition is that this market price of risk at time ¢ does not depend on the maturity 


T of the bond. We can then set 


09 [engeren] 


and (7.3) is satisfied. 
(The remainder of this chapter was taught Mar 21) 


Suppose the market price of risk does not depend on the maturity T, so we can solve (7.3) for 8. 
Plugging this into the stochastic differential equation for B(t, T), we obtain for every maturity 7’: 


dB(t,T) = r(t)B(t,T) dt — o*(t, T)B(t, T) dW(t). 


Because (7.4) is equivalent to (7.3), we may plug (7.4) into the stochastic differential equation for 
f(t, T) to obtain, for every maturity T: 


df(t, T) = [o(t, T)o*(t,T) + o(t, T)O(t)] dt + a(t, T) dW (t) 
= o (t, T)o* (t, T) dt + o(t, T) dW (8). 


28.8 Implementation of the Heath-Jarrow-Morton model 


Choose 
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These may be stochastic processes, but are usually taken to be deterministic functions. Define 


a(t, T) = o(t, T)o*(t,T) + o(t, T)9(t), 


Let f(0,7), 0 < T < T*, be determined by the market; recall from equation (4.1): 


f(0,T) = -2 los BOT): 0S TET. 


Then f(t, T) for 0 < t < T is determined by the equation 
df(t, T) = o(t, T)o* (t, T) dt + o(t,T) dW(0), (8.1) 
this determines the interest rate process 
r(t) = f(t,t), 0<t<T*, (8.2) 


and then the zero-coupon bond prices are determined by the initial conditions B(0,T), 0< T < 
T'*, gotten from the market, combined with the stochastic differential equation 


dB(t, T) = r(t)B(t, T) dt — o* (t, T)B(t, T) dW (8). (8.3) 


Because all pricing of interest rate dependent assets will be done under the risk-neutral measure P, 
under which W is a Brownian motion, we have written (8.1) and (8.3) in terms of W rather than 
W. Written this way, it is apparent that neither 6 (t) nor a(t, T) will enter subsequent computations. 
The only process which matters is o (t, T), 0 < t < T < T*, and the process 


T 
e T) = f o(t u) du, O<t<T<T*, (8.4) 
t 


obtained from o (t, T). 
From (8.3) we see that o*(t, T) is the volatility at time t of the zero coupon bond maturing at time 
T. Equation (8.4) implies 

o*(T,T)=0, 0<T<T*. (8.5) 
This is because B(T, T) = 1 and so as t approaches T (from below), the volatility in B(t, T) must 
vanish. 


In conclusion, to implement the HJM model, it suffices to have the initial market data B(0, T), 0 < 
T < T*, and the volatilities 
ET), O<t<T <T”. 
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We require that o*(t, T) be differentiable in T and satisfy (8.5). We can then define 


ot T)= Sot), 


and (8.4) will be satisfied because 


T 
PET) = ET) -th= f ot, u) du 
t u 


We then let W be a Brownian motion under a probability measure IP, and we let B (if), 0<t< 
T < T*, be given by (8.3), where r(t) is given by (8.2) and f(t, T) by (8.1). In (8.1) we use the 
initial conditions 


a 
log B(0,T), 0<T<T". 


F(0, T) = OT 


Remark 28.3 It is customary in the literature to write W rather than W and FP rather than P, 
so that IP is the symbol used for the risk-neutral measure and no reference is ever made to the 
market measure. The only parameter which must be estimated from the market is the bond volatility 
o*(t, T), and volatility is unaffected by the change of measure. 
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Chapter 29 


Gaussian processes 


Definition 29.1 (Gaussian Process) A Gaussian process X (t), t > 0, is a stochastic process with 
the property that for every set of times 0 < tı < t2 < ...< tn, the set of random variables 


X (t1), X (t2),..., Alba) 
is jointly normally distributed. 
Remark 29.1 If X is a Gaussian process, then its distribution is determined by its mean function 
m(t) = JEX (t) 
and its covariance function 
p(s,t) = IE[(X(s) — m(s)) - (X(8) — m(t))]. 

Indeed, the joint density of X (t1),..., X (tn) is 

IP{X (ty) € oe , X (ta) E€ dën} 


= AJETE det Y exp {4 pues (x — m(t))T)} dzi... dën, 


where > is the covariance matrix 


plti,ti) plit) ...  plti,tn) 
Sea pít2,t1) pltz,ta) ... pltr,tn) 
plin; tı) plin, t2) sas Plin, tn) 
x is the row vector [£1, %2,... , £n], t is the row vector [t1, f2,... , tn], and m(t) = [m(t1), m(t2),... 


The moment generating function is 


Bo Y uxt} = exp Lu m(t)" + su: : ih, 


k=1 


where u = [u,, Ug,--- , Un]. 
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29.1 An example: Brownian Motion 


Brownian motion W is a Gaussian process with m(t) = 0 and p(s,t) = s At. Indeed, if0 < s < t, 
then 


p(s, t) = E [W(s)W(t)] = EŒ [W(s) (W(t) — W(s)) + W*(s)| 
= EW (s) JE (W(t) — W(s)) + EW? (s) 
= IEW?(s) 
5 Ab: 


To prove that a process is Gaussian, one must show that X (¢1),..., X (tn) has either a density or a 
moment generating function of the appropriate form. We shall use the m.g.f., and shall cheat a bit 
by considering only two times, which we usually call s and t. We will want to show that 


IE exp {u1 X (s) + u2X (t)} = exp fam: + ugm + 5[u1 ue] E a E l 


O21 722| | U2 


Theorem 1.69 (Integral w.r.t. a Brownian) Let W (t) be a Brownian motion and 6(t) a nonran- 
dom function. Then 


t 
X(t) = f §(u) dW (u) 
0 
is a Gaussian process with m(t) = 0 and 


p(s,t) = i. 6° (u) du. 


Proof: (Sketch.) We have 


dX = 6 dW. 
Therefore, 
det*(s) = ue (9) §(s) dW (s) + Lu?e" (9) §2(5) ds, 
HX) = ¿X0O) y y I XD o) dW(v) +u? f 08 (o) do, 
Ee) =] + sue is ret dv, 
£ pex = ESO, 


BetX(s) — eX (0) exp (qu? [ew do) (1.1) 
0 


= exp | qu? [0549 dv} 


This shows that X (s) is normal with mean 0 and variance fọ 5?(v) dv. 
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Now let 0 < s < t be given. Just as before, 
dež ©) = ue" OS) dW(t) + 1208 (t) de. 
Integrate from s to t to get 
t t 
ger) == uf ¿(ue dW (v) + bu? f ye) du, 
Take JE. . .|#(s)] conditional expectations and use the martingale property 


E | f ex dW (o) Fs) -E | / * §(v)e"X) (o) Fis) > f ex O dW (v) 


Ss 


=0 
to get 
t 
E jer Fis) = eX (s) 4 w f Pv) IE jer Fs) dv 
d 
Gk eolrs] = peoe [07], 12 
The solution to this ordinary differential equation with initial time s is 
t 
IE enn Fis) = etX(s) exp (he | 5°(v) dv} t2s8. (1.2) 


We now compute the m.g.f. for (X (s), X (t)), where 0 < s < t: 
Fs) 
t 
= eltu) X (5) exp {4u3 | ô’ (v) do) ; 


E jx oero] -E [E [Rat 


t 
=D exp { 40 f 8? (v) do) 
t 


E eens 


59] = 0x0 [exe 


= exp E + 2uyu2) / ô’ (v) dv 


+ 
= 1 So 5° So 5 ui 
= exp {tn u2] ee fee us| f 
This shows that (X (s), X (£)) is jointly normal with IÆ X (s) = IEX (t) = 0, 


EX(s) = | P) dv,  EX?(t) = f Pe) del 
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Remark 29.2 The hard part of the above argument, and the reason we use moment generating 
functions, is to prove the normality. The computation of means and variances does not require the 
use of moment generating functions. Indeed, 


x(t) = [ow dW (u) 


is a martingale and X (0) = 0, so 


For fixed s > 0, 


Therefore, 


If ô were a stochastic proess, the Itó isometry says 
EX?(s) = f “ES (v) de 
and the same argument used above shows that for 0 < s < t, 
ElX(s)X()] = BX2(s) = f ES? (v) dv. 


However, when 6 is stochastic, X is not necessarily a Gaussian process, so its distribution is not 
determined from its mean and covariance functions. 


Remark 29.3 When 6 is nonrandom, 


X(t) = [ow dW (u) 


is also Markov. We proved this before, but note again that the Markov property follows immediately 
from (1.2). The equation (1.2) says that conditioned on F (s), the distribution of X (t) depends only 
on X (s); in fact, X (t) is normal with mean X (s) and variance ff 5?(v) dv. 
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(b) 


(c) 


Figure 29.1: Range of values of y, z, v for the integrals in the proof of Theorem 1.70. 


Theorem 1.70 Let W(t) be a Brownian motion, and let 5(t) and h(t) be nonrandom functions. 
Define 


Then Y is a Gaussian process with mean function my (t) = 0 and covariance function 


prs) = [89 (Pro dy) (J aw) dy) dv. (1.3) 


Proof: (Partial) Computation of py (s,t): Let 0 < s < t be given. It is shown in a homework 
problem that (Y (s), Y (£)) is a jointly normal pair of random variables. Here we observe that 


and we verify that (1.3) holds. 


290 
We have 
pr (sit) = EY (Y 0] 
= E ox ay. | h(z)X (2) dz! 
( 


s t 
=Ef |n 
O O 


y)h(z)X (y)X (2) dy dz 


= | P) [fron 2) dy dz) dv 
( 


Remark 29.4 Unlike the process X(t) = fj 5(u) dW (u), the process Y (t) = f3 X (u) du is 
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neither Markov nor a martingale. For 0 < s < t, 


EY ()|F(s)] = i h(u)X (u) du + JE Fe WX tajda Fis) 
sary f ' h(u)ELX |F] du 
SrA f reoxts di 


where we have used the fact that X is a martingale. The conditional expectation JE[Y (t)|F(s)] is 
not equal to Y (s), nor is it a function of Y (s) alone. 
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Chapter 30 


Hull and White model 


Consider 
dr(t) = (a(t) — B(t)r@)) dt + a(t) dW (t), 
where a(t), 3(t) and o(t) are nonrandom functions of t. 


We can solve the stochastic differential equation. Set 
t 
Ke f Ba ai: 
0 
Then 


d (eR r(t)) = E0 (seri) dt + ar(t)) 
Integrating, we get 


so 


r(t) = 0) [0 + i eK afu) du + l eK o(u) aw(u)| : 
From Theorem 1.69 in Chapter 29, we see that r(t) is a Gaussian process with mean function 
m, (t) = KO [o + [ eK afu) dul (0.1) 
and covariance function 
pe(syt) = KOKO f PK G2) du. (0.2) 
0 
The process r(t) is also Markov. 
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We want to study de r(t) dt. To do this, we define 


X(t) = f a dW(u), ¥(T) = f © KOX) de 
Then 


r(t) = e“*® [0 + [ eK afu) n +e ® X(t), 


[rw =P eK) o j+ PA a(u) du] dt +Y(T). 


According to Theorem 1.70 in Chapter 29, ie r(t) dt is normal. Its mean is 


Ef r a= fe E [río j+ fer a(u) du] dt, (0.3) 


and its variance is 


T 
var (/ r(t) a) = EY?’ (T) 


AR gS > = 2 
a Ko (v) (/ eKO) a) dv. 
0 v 


The price at time 0 of a zero-coupon bond paying $1 at time T is 


B(0, T) = IE exp f- ro a) 


= exp{—r(0)C(0,T) — A(0, T)), 
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u 
Figure 30.1: Range of values of u, t for the integral. 
30.1 Fiddling with the formulas 


Note that (see Fig 30.1) 


B(0,T) = exp {-r(0)C (0, 7) — A(0, T)}. 
Consider the price at time ¢ € [0, T] of the zero-coupon bond: 


BUT) =e eso {= f ro da] rw) 4 


Because r is a Markov process, this should be random only through a dependence on r(t). In fact, 


B(t, T) =exp{-r()C(t, T) — A(t, T)}, 
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where 


T a T > > T > E 
A(t,T) =f EW av) (/ eKO) iy) — LK) 72 (v) (/ Kw) iy) dv, 
t v v 
a T a 
CET) = cKO [KW dy, 
t 


The reason for these changes is the following. We are now taking the initial time to be ¢ rather than 
zero, so it is plausible that ie ... dv should be replaced by JE ... dv. Recall that 


K(v) = f B0) da, 
and this should be replaced by 
K(v) — K(t) = / “Bid 
Similarly, K (y) should be replaced by K(y) — K (t). Making these replacements in A(0, T), we 


see that the K (t) terms cancel. In C (0, T), however, the K (t) term does not cancel. 


30.2 Dynamics of the bond price 


Let C(t, T) and A;(t, T) denote the partial derivatives with respect to t. From the formula 
B(t,T) = exp {-r(t)C(t,T) - A(t, T)}, 
we have 
dB(t,T) = B(t,T) [-C(t,T) dr(t) — $07(t, T) dr(t) dr(t) — r(t)Ci(t,T) de — A(t, T) dt] 
= B(t,T) | ~ C(t, T) (a(t) — BlOr(t) de 
- C(t, T)o(t) dW(t) — $C? (t, T)o?(t) dt 


— r(t)Ci(t, T) dt — A(t, T) ae 


Because we have used the risk-neutral pricing formula 


BUT HE eso f- fro da] rw) 


to obtain the bond price, its differential must be of the form 


dB(t,T) = r(t)B(t, T) dt +(...) dw(t). 
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Therefore, we must have 
-C (t, T) (a(t) — B(t)r(t)) - $C? (t, Tyo? (t) — r()Ci(t, T) — Alt, T) = r(t). 


We leave the verification of this equation to the homework. After this verification, we have the 
formula 


dB(t,T) = r(t)B(t, T) dt — o(t)C (t, T)B(t, T) dW (8). 


In particular, the volatility of the bond price is ø (t)C (t, T). 


30.3 Calibration of the Hull & White model 
Recall: 


dr(t) = (a(t) — P(t)r(t)) dt + of) dB(t), 


a T 5 
C(t, T) = KO f KO) dy, 
t 


B(t,T) = exp {-r(t)C(t, T) — A(t, T)}. 


Suppose we obtain B (0, T) for all T € [0, T*] from market data (with some interpolation). Can we 
determine the functions a(t), 6 (t), and ø (t) for all t € [0, T*]? Not quite. Here is what we can do. 


We take the following input data for the calibration: 


1. B(0,T), 0 <T < T*; 


4. o(t), 0 < t < T* (usually assumed to be constant); 


5. o(0)C(0,T), 0 < T < T*, i.e., the volatility at time zero of bonds of all maturities. 


Step 1. From 4 and 5 we solve for 
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We can then compute 


Y _ KT) 

gre% T) = e 

= K(T)= -log 2600.7), 
o 


: ð fT 
FET =a f PO du = (0). 


We now have /3(7’) for all T € [0, 7*]. 
Step 2. From the formula 


B(0,T) = exp{=r(0)C(0,T) — A(0, T)}, 
we can solve for A(0, T) for all T € [0, T*]. Recall that 


T a T a > T > Ñ 
A(0, T) = f EW av) (/ eK) iy) — e?” Ma (v) (/ eK) iy) dv. 
0 v v 


We can use this formula to determine a(T), 0 < T < T* as follows: 


ð | roo |- K(T) Sl 2K(0) 21.3 ¿=K(T) 
SP ap iF) Se" aT) E o“(v) e dv, 
cio o 
K(T) K(T — 2K(T) ER 2K(v) 2 
e SP Sp Ao, 7) e a(T) | e o“ (v) dv, 
ð [Txm9T[xm2 | tem .2K(T) 2K(T) —,2K(T) 2 : 
—— — — amara) = < < . 
ar |e ar |e api, T) a(T)e +2a(T)G(T)e € o(T), O0<T<T 


This gives us an ordinary differential equation for ex, i.e., 
al (t)e?* +4 2e(t) 8 (t)e24 — e? Mo? (4) = known function of t. 


From assumption 4 and step 1, we know all the coefficients in this equation. From assumption 3, 
we have the initial condition a(0). We can solve the equation numerically to determine the function 
a(t), MEET, 


Remark 30.1 The derivation of the ordinary differential equation for a(t) requires three differ- 
entiations. Differentiation is an unstable procedure, i.e., functions which are close can have very 
different derivatives. Consider, for example, 


f(z) =0 Vee R, 
_ sin(10002) 


(a= T Va € R. 
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Then 


but because 
g'(x) = 10cos(1000z), 
we have 


f(x) — g'(x)| = 10 


for many values of z. 
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Assumption 5 for the calibration was that we know the volatility at time zero of bonds of all maturi- 
ties. These volatilities can be implied by the prices of options on bonds. We consider now how the 


model prices options. 


30.4 Option on a bond 


Consider a European call option on a zero-coupon bond with strike price K and expiration time T}. 


The bond matures at time 73 > T4. The price of the option at time 0 is 
T. 
E |e- h OO K)*] 
= fo r(u) du -\ + 
= Ie 0 (exp{—r(T1)C(Th, Tə) = A(T), T2)} = K) š 
+ 


= D FP e? (exp{—ye(ni, Ta) — A(T,,T2)} — K) f(x,y) dx dy, 


where f(x, y) is the joint density of Whe r(u) du, r(T1)). 


We observed at the beginning of this Chapter (equation (0.3)) that i r(u) du is normal with 


pa SE T r(u) J = P IEr(u) du 


0 


We also observed (equation (0.1)) that r (T1) is normal with 


112 = Er(Tı) = r(0)e ED + il eM afu) du, 


0 


. Tí £ 
o? = var (r(T1)) = ere e e” (4) 2 (a) du. 
0 
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In fact, Os r(u) du, r(T1)) is jointly normal, and the covariance is 


Ti 
SE / (r(u) — Er(u) du. (r(T,) — Er(P,) 


= i JE[(r(u) — IEr(u)) (r(T1) — Ær (T,))] du 


Tí 
= [plu Ti) du, 
0 


where p, (u, T1) is defined in Equation 0.2. 


The option on the bond has price at time zero of 


PJ (ota, T) -ATT -KY 


—00 Y—00 


1 1 a 2paiy y 
.— exp A +> dx dy. (4.1 
27010741 — p? P| 2(1 — p?) 5 0103 o? O 
The price of the option at time t € [0, T1] is 


Ti 
Elek “DBA TAK" 


F| 


= E | fet rw) du (exp{=r(T)C(T1, To) A tS O 


F| (4.2) 


Because of the Markov property, this is random only through a dependence on r(t). To compute 
this option price, we need the joint distribution of ( n r(u) du, r(T1)) conditioned on r(t). This 
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pair of random variables has a jointly normal conditional distribution, and 


= r(t)e KTDK) m EE) f 


t 


Q 
bo bo 
os 
+ 
Cs 
lI 


E [eT - mor FO 


= ener f © 2K) 92a) du, 
t 


Tí 
poetet =E [Pro diia) Ut) -0| 


= 4 AOR PR) o? (v) dv du. 
t t 


The variances and covariances are not random. The means are random through a dependence on 
r(t). 
Advantages of the Hull & White model: 


1. Leads to closed-form pricing formulas. 
2. Allows calibration to fit initial yield curve exactly. 
Short-comings of the Hull & White model: 


1. One-factor, so only allows parallel shifts of the yield curve, i.e., 
B(t,T) = exp {-r(C(t,T) - A(t, T)}, 
so bond prices of all maturities are perfectly correlated. 


2. Interest rate is normally distributed, and hence can take negative values. Consequently, the 


bond price 
T 
B, T) = E fes» (-/ r(u) da] Fo] 


can exceed 1. 
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Chapter 31 


Cox-Ingersoll-Ross model 


In the Hull € White model, r (t) is a Gaussian process. Since, for each t, r(t) is normally distributed, 
there is a positive probability that r(t) < 0. The Cox-Ingersoll-Ross model is the simplest one which 
avoids negative interest rates. 


We begin with a d-dimensional Brownian motion (W1, W2,...,Wa). Let 8 > 0 and o > 0 be 
constants. For j = 1,...,d, let X;(0) € JR be given so that 


XP (0) + XZ (0) +... +.XG(0) > 0, 
and let X; be the solution to the stochastic differential equation 
dX;(t) = —48X,(1) dt + Lo AW; (t), 


X; is called the Orstein-Uhlenbeck process. It always has a drift toward the origin. The solution to 
this stochastic differential equation is 


1 t 1 
X,(t) = 30 [x0 + to f exe AO) . 
This solution is a Gaussian process with mean function 
Lo 
mj(t) = 7 2°*X;(0) 


and covariance function 


Define 
r(t) Ê X2(t) + XA +... 4 X20). 


If d = 1, we have r(t) = X?(t) and for each t, IP{r (t) > 0} = 1, but (see Fig. 31.1) 


IP {There are infinitely many values of t > 0 for which r(t) = o) =i 
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r(t) = X(t) 


(X (0, % (0) 


SÁ l 


Figure 31.1: r(t) can be zero. 


a 


If d > 2, (see Fig. 31.1) 
IP{ There is at least one value of t > 0 for which r(t) = 0} = 0. 
Let f(v1,22,...,04) = £] +234...4+ 2%. Then 


E 
fz: = 2t, friz; = Pr : 
0 ifi Æj. 


It6’s formula implies 


d d 
¿=1 


i=l 


d d 
= 5D 2X; (40%: dt + Lo aWilt)) +0 307 aW; aW; 


= —Br(t) dt + SA dW; + na dt 
do? Xi 
7 (= = arto) dt + afro Y D dw;(t). 


Define 
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Then W is a martingale, 


d X; 
= dW;, 


d x2 
dW dW =X dt = dt, 
r 


= 


so W is a Brownian motion. We have 


do? 


dr(t) = (z — arto) dt + o,/r(t) dW (t). 
The Cox-Ingersoll-Ross (CIR) process is given by 


dr(t) = (a — fr(t)) dt + oy/r(t) dW (6), 


We define 
4a 


d= > 0. 


If d happens to be an integer, then we have the representation 


but we do not require d to be an integer. If d < 2 (1.e., a < 207), then 
JP{There are infinitely many values of t > 0 for which r(t) = 0} = 1. 


This is not a good parameter choice. 


If d > 2 (ie. a > 307), then 
IP{ There is at least one value of t > 0 for which r(t) = 0} = 0. 


With the CIR process, one can derive formulas under the assumption that d = 4% is a positive 


integer, and they are still correct even when d is not an integer. 


For example, here is the distribution of r (t) for fixed t > 0. Let r(0) > 0 be given. Take 
X1(0)=0, X2(0) =0, ..., Xg_1(0) = 0, Xa(0) = 4/r (0). 


For i = 1,2,...,d — 1, X;(t) is normal with mean zero and variance 


a? 
p(t,t) = qa eo) 
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Xa(t) is normal with mean 


and variance p(t, t). Then 


d-1 2 
X;(t 
(v= AE ( L) + x0 0.1) 
Sk Normal squared and independent of the other 
Chi-square with d — 1 = tag? degreesof term 
freedom 
Thus r(t) has a non-central chi-square distribution. 
31.1 Equilibrium distribution of r (t) 
As t—>00, ma(t) >0. We have 
xO Y 
r(t) = plt,t) ( , 
2 vplt,t) 
As t—>00, we have p(t, t) = G, and so the limiting distribution of r (t) is 5 times a chi-square 


with d = 5 degrees of freedom. The chi-square density with = degrees of freedom is 


1 20-02 


Fy) 


We make the change of variable r = Zy. The limiting density for r(t) is 


o2 
_ 46 1 18, F -28, 
E E 


2a 
We computed the mean and variance of r(t) in Section 15.7. 


31.2 Kolmogorov forward equation 


Consider a Markov process governed by the stochastic differential equation 


dX (t) = b(X(t)) dt + o (X (t)) aW (8). 
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Figure 31.2: The function h(y) 


Because we are going to apply the following analysis to the case X(t) = r(t), we assume that 
X(t) > 0 for all £. 

We start at X(0) = x > 0 at time O. Then X(t) is random with density p(0, t, x, y) (in the y 
variable). Since 0 and z will not change during the following, we omit them and write p(t, y) rather 
than p(0, t, x, y). We have 


EXO) = [tri dy 


for any function h. 


The Kolmogorov forward equation (KFE) is a partial differential equation in the “forward” variables 
t and y. We derive it below. 


Let h(y) be a smooth function of y > 0 which vanishes near y = 0 and for all large values of y (see 
Fig. 31.2). It6’s formula implies 


dh(X(t)) = [WX (AX E) + AMPO) dt + KX (t) awd, 


so 


h(X(1)) = A(X (0)) + / * [W(X (s))0(X(8)) + EUA) A X(S)] ds + 
[ PEDAH) arts, 


ERX) = AXO) E [NAS dt + P) s))] as, 
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or equivalently, 


Prov y) dy = xo f f hy (s, y) dy ds + 


1 T [ren dy ds. 


Differentiate with respect to ¢ to get 


Pronto ay = [rw dyti [wr dy. 
Integration by parts yields 
[Prost ay = DAD 


=0 


i. “Ona 


Therefore, 


Pronto dy == [Prom +4 [Pr (wm) de 


or equivalently, 
4 ô 19 2 
[PO Jpn) + Ort.) = +37 (094.0) dy=0. 
This last equation holds for every function A of the form in Figure 31.2. It implies that 


pit, y) + Z (O)plt, y) — un ((y)p(t, y) =0. (KFE) 


If there were a place where (KFE) did not hold, then we could take h(y) > 0 at that and nearby 
points, but take A to be zero elsewhere, and we would obtain 


2 


f>» + 0n- en] pee. 
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If the process X (t) has an equilibrium density, it will be 
p(y) = lim p(t, y). 
In order for this limit to exist, we must have 
0= im p(t, y). 
Letting too in (KFE), we obtain the equilibrium Kolmogorov forward equation 


5, Huw) = +37 (Pow) = 


When an equilibrium density exists, it is the unique solution to this equation satisfying 


ply) 20 Wy > 0, 


[ow dy = 1. 


31.3 Cox-Ingersoll-Ross equilibrium density 


We computed this to be 


We compute 


2a — 0? 2 
po EL 0) 
= = (a- $0? — 8r) g(r), 
2 9 r 
p”(r) = ME (a = Lo? as Br) p(r) + 52, (Ppt) + ae (a pa Lo? DS Br) p'(r) 
2 1 o 
= ( 7 (a 50° — fr) - B4 (a= do? 07) p(r) 


We want to verify the equilibrium Kolmogorov forward equation for the CIR process: 


2 ((a — Br)p(r)) — ¿E (erp) =0. (EKFE) 


The LHS of (EKFE) becomes 


—Bp(r) + (a — Br)p (1) — 0*p (1) — So? rp" (r) 


= Ol! B+ (a Br-0*)(0- 30" — Br) 

“(a lo? Br) l B la e 1 ¿2 _ ar)? 
20 ke -30° — Br) Z Lo? — Gr) 

- $0? (a -b0 — Br) 

“(a 30° — Br) (a - Lo? — ar? 


as expected. 


31.4 Bond prices in the CIR model 


The interest rate process r(t) is given by 
dr(t) = (a — Br(t)) dt + o,/r(t) dW (t), 


where r(0) is given. The bond price process is 


B(t,T) = E SO da] Fol ; 
exp {= [rw du} BET) =E O da] Fol > 


the tower property implies that this is a martingale. The Markov property implies that B(t, T) is 
random only through a dependence on r(t). Thus, there is a function B(r,¢, T) of the three dummy 
variables r, t, T such that the process B(t, T) is the function B(r,t, T) evaluated at r(t), t, T, i.e., 


Because 


B(t,T) = B(r(t),t,T). 
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Because exp f- for(u) du) B(r(t),t, T) is a martingale, its differential has no dt term. We com- 
pute 


The expression in |. . .] equals 


= -rB dt + B, (0: — Br) dt + B,ovy/r dW 
+ $Bppo?r dt + B, dt. 


Setting the dt term to zero, we obtain the partial differential equation 
— rB(r,t,T) + By(r,t,T) + (a — Br)B,(r,t,T) + 30*rB,,(r,t,T) = 0, 
0<t<T, r>0. (41) 


The terminal condition is 
B(r,T,T)=1, r>0. 


Surprisingly, this equation has a closed form solution. Using the Hull & White model as a guide, 
we look for a solution of the form 


Br, t; T) = eo) = at) 


where C(T, T) = 0, A(T, T) = 0. Then we have 


Bi 
B, 


(=rC; == A,)B, 
-CB, B, = C?B, 


and the partial differential equation becomes 


0 = =r B + (-rC; — 4:)B — (a — Br)CB + ho*rC?B 
= rB(—1 — C; + 8C + 40°C”) — B(A, + aC) 


We first solve the ordinary differential equation 
-1 — C(t, T) + C(t, T) + 40°C? (t, T)=0; C(T,T)= 0, 
and then set 


A(t, T) = a f Clu, T) du, 


t 


312 
so A(T, T) = 0 and 
Alt, T) = —aC(t, T). 
It is tedious but straightforward to check that the solutions are given by 
sinh(y(T — t)) 
ycosh(y(T — t)) + ¿Bsinh(y(T — t))’ 
yezb(T—t) 
ycosh(y(T — t)) + 32 sinh(y(T — t)) 


a 44/3? + 207, sinh u = —— cosh u = ZE. 


Thus in the CIR model, we have 


C(t, T) = 


2 
At, T) = -log 
oO 


? 


where 


T 
E fes» {-/ +(u) da] Fol = Blr(t),t,D), 
t 
where 
B(r,t,T) = exp {-rC( T) - AŒ, T)}, 0<t<T, r>0, 
and C (t, T) and A(t, T) are given by the formulas above. Because the coefficients in 
dr(t) = (a — Br(t)) dt + a,/r(t) dW (t) 


do not depend on t, the function B (r,t, T) depends on t and T only through their difference r = 
T — t. Similarly, C(t, T) and A(t, T) are functions of r = T — t. We write B(r,7) instead of 
B(r,t, T), and we have 


B(r,7) =expi=rC(r)- A(T)}, T20, r20, 


where 
sinh (77) 
C = MMM. 
Sl y cosh(yr) + 38 sinh(y7)' 
1 
2a e297 
GS | a 
(7) gee - cosh(yr) + 58 wars] 
+= 3/0? + 202, 
We have 


B(r(0), T) = po f ro de). 


Now r(u) > 0 for each u, almost surely, so B(r(0), T) is strictly decreasing in T. Moreover, 
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jim B(r(0), T) = E exp {- [ r(u) du} =0. 


But also, 
B(r(0), T) = exp (—r(0)C (T) - A(T)}, 
r(0)C(0) + 4(0) = 0, 
Jim [r(0)C(T) + A(T)] = 00, 
and 


r(0)C(T)+ A(T) 


is strictly inreasing in T. 


31.5 Option on a bond 


The value at time ¢ of an option on a bond in the CIR model is 


Ty 
v(t, r(t) = E fes» (-/ (4) da] (B(T, Ta) — K)* rw) 
t 
where 7; is the expiration time of the option, Ta is the maturity time of the bond, and 0 < t < Ty < 
Tz. As usual, exp f- fo r(u) du} v(t, r(t)) is a martingale, and this leads to the partial differential 
equation 
ro + vs + (a — Br)v, + 40° Pr Upr =0, 0<t<T,, r>0. 


(where v = v(t, r).) The terminal condition is 
v(T,,r) =(B(r,T Ta) - K), r>0. 


Other European derivative securities on the bond are priced using the same partial differential equa- 
tion with the terminal condition appropriate for the particular security. 


31.6 Deterministic time change of CIR model 


Process time scale: In this time scale, the interest rate r(t) is given by the constant coefficient CIR 
equation 


dr(t) = (a — Pr(t)) dt + 0x/r(t) dW (t). 
Real time scale: In this time scale, the interest rate ° (+) is given by a time-dependent CIR equation 


HO) dé + oO yw av. 


di (f) = (a(t) — BE 


t: Process time 
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f: Real time 


A pe- 
riod of high inter- 
est rate volatility 


Figure 31.3: Time change function. 


There is a strictly increasing time change function ¢ = y(t) which relates the two time scales (See 
Fig. 31.3). 


Let B (F,ê, T) denote the price at real time f of a bond with maturity 7’ when the interest rate at time 
tis f. We want to set things up so 


BF t, T) = B(r,t,T) = EC T)A(T), 


where t = p(6), T = p(T), and C(t, T) and A(t, T) are as defined previously. 


We need to determine the relationship between F and r. We have 


With T = p(Í), make the change of variable t = (Ê), dt = y'(é) dé in the first integral to get 


Ê A A 
B(r(0),0,7)= Bexp {- retro ai 


and this will be B(#(0), 0, 7) if we set 
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31.7 Calibration 


p 
z SED aa (etd, eT} 
p'(t) 
= exp [Ct 18] > A(é, ty} ? 


where 


do not depend on f and T only through T — f, since, in the real time scale, the model coefficients 
are time dependent. 


Suppose we know +(0) and B(#(0), 0,7) for all T € [0, 7]. We calibrate by writing the equation 
B(F(0), 0,7) = exp {-FO)C(0,) — A(0, P) Y, 


or equivalently, 


O i 


- log B(ĉ(0),0, Î) = C(v0), o(2)) + A(P(0), 9(2)). 


Take a, 9 and o so the equilibrium distribution of r(t) seems reasonable. These values determine 
the functions C, A. Take p'(0) = 1 (we justify this in the next section). For each T’, solve the 
equation for (T): 


— log B(F(0), 0,7) = FO)C(O, p(P)) + AO, (f). *) 


The right-hand side of this equation is increasing in the p(T ) variable, starting at O at time 0 and 
having limit oo at oo, i.e., 
*(0)C'(0, 0) + A(0, 0) = 0, 
li f T) + A(0,7P)| = 00. 
Him (OC, T) + A(0, T)] = œ 


Since 0 < — log B(F(0), 0, T) < 00, (*) has a unique solution for each Ê. For T = 0, this solution 
is (0) = 0. If Ti < Th, then 


— log Ê (r (0), 0, ĉi) < —log B(r(0), 0, 7), 


so (È 1) < (È 2). Thus ¢ is a strictly increasing time-change-function with the right properties. 
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31.8 Tracking down ¿'(0) in the time change of the CIR model 


Result for general term structure models: 


ð 
-3T log B(0, n| = r(0). 


Justification: 
T 
B(0, T) = IE exp -f r(u) dup. 
o 


— log B(0, T) = -og Beso {= [rte da] 


j E feh reo e] 
s e ee dl 
OT g ( ) Ee- h r du 


0 


T=0 


In the real time scale associated with the calibration of CIR by time change, we write the bond price 
as 


B(F(0), 0,7), 
thereby indicating explicitly the initial interest rate. The above says that 
a a Z 
—— log B(F(0),0, T = F(0). 
sp oe BUF(O).0,7)] 70) 


The calibration of CIR by time change requires that we find a strictly increasing function y with 
(0) = 0 such that 


1 
¢'(0) 
where B(#(0), 0, 7), determined by market data, is strictly increasing in Ê, starts at 1 when Ê = 0, 
and goes to zero as Too. Therefore, — log B(F(0), 0, T) is as shown in Fig. 31.4. 


— log B(*(0),0, 2) = —HO)C(AT) + Al), È >o, (cal) 


Consider the function 


FO)C(T) + A(T), 
Here C (T) and A(T) are given by 
sinh (yT 
C(T) = = 
ycosh(yT) + 58 sinh (yT) 
1 
BT 
AS A | 
0? ycosh(yT') + 58 sinh(yT) 


y = $y 6? + 20?. 
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— log B(F(0), 0,7) 


Goes to 00 


Strictly increasing 


Figure 31.4: Bond price in CIR model 


— log B(F(0), 0,7) 


be 


Figure 31.5: Calibration 


The function F(0)C(D) + A(T) is zero at T = 0, is strictly increasing in T, and goes to oo as 
T'— oo. This is because the interest rate is positive in the CIR model (see last paragraph of Section 
31.4). 


To solve (cal), let us first consider the related equation 
— log B(F(0), 0,7) = FO)C(y(P)) + Ale). (cal’) 
Fix T and define ¿(T') to be the unique T for which (see Fig. 31.5) 
—log B(F(0),0, 7) = *(0)C(T) + A(T) 


If T = 0, then p(Í) = 0. fÊ, < Ta, then p(T1) < (Ê). As T—=00, y(T)—00. We have thus 
defined a time-change function y which has all the right properties, except it satisfies (cal’) rather 
than (cal). 
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We conclude by showing that »’(0) = 1 so y also satisfies (cal). From (cal’) we compute 


F(0),0,7) 


( 
T=0 

PO) (0))2'(0) + A'((0)) o (0) 
= P(0)C'(0)p'(0) + A*(0)9'(0). 


We show in a moment that C’(0) = 1, A’(0) = 0, so we have 
F(0) = F(0)p"(0). 


Note that ° (0) is the initial interest rate, observed in the market, and is striclty positive. Dividing by 


F(0), we obtain 
¢'(0) =1. 


Computation of C*(0): 

1 
$e 
(y cosh(y7) + 38 sinh(77)) 


— sinh (y7) Ce sinh(y7) + igy cosh(77))| 


C'(r)= E cosh(yr) (> cosh(yT) + 38 sinh(77)) 


1 
C'(0) = 5 [10 +0) - 00+ 38y)] =1. 
Computation of A’(0): 
a 2a |ycosh(yr) + +6 sinh(yT) 
AOS oa jp] 
x os $ [Een (> cosh(yT) + 38 sinh(77)) 


(> cosh(y7) + 38 sinh(yr)) 
es yell? (y sinh(y7) + igy cosh(77))| , 
0-2 [Eo +0 - 10+ 457) 


2 
2a 1 | By? 
ae dl 


Chapter 32 


A two-factor model (Duffie & Kan) 


Let us define: 


X(t) = Interest rate at time t 


X2(t) = Yield at time t on a bond maturing at time t + To 


Let X1(0) > 0, X2(0) > 0 be given, and let Xy (t) and X(t) be given by the coupled stochastic 
differential equations 


dX, (t) = (a11 Xy (t) T ar X2(t) bı) dt 01/61 X1 (t) T B2 Xa lt) TQ dW, (t), (SDE1) 
dX2(t) = (a21 Xy (t) T a22 X2(t) b2) dt ra AX (t) ar 92 Xa (t) TQ (p dW, (t) + 1- p? dW (t)), 


(SDE2) 


where W, and Wa are independent Brownian motions. To simplify notation, we define 


Y(t) 2 aX) (t) + G2Xo(t) +a, 
w(t) E pWi(t) + f1— Watt). 


Then W3 is a Brownian motion with 
dW,(t) dW3(t) = p dt, 
and 
dX, dXı = 0°Y dt, dXodX2=0%2Y dt, dX ,dX_= poyooY dt. 
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32.1 Non-negativity of Y 


dY = pı dX, + By dX, 
= (B1a11X1 + Pja12X2 + P1b1) dt + (B2021 A1 + P2022X2 + Baba) dt 


+ VY (8101 dW, + Bopo dW, + Bo\/1 — p2a2 dW.) 
= [(P1411 + 82021) X1 + (61412 + B2092) X2] dt + (81b1 + P2b9) dt 


1 
+ (820? + 261 b2p0102 + 3303)2,/Y (t) dWa(t) 
where 


(Broa + Bapo2)Wilt) + Pay/1 — pre WaN) 


Biol +281B2p0102 + p203 


is a Brownian motion. We shall choose the parameters so that: 


Wa(t) = 


Assumption 1: For some Y, Bray + Ba431 = vB, B1412 + Ba432 = ypa. 
Then 
dY =[yB1X1 + yG2X2 + ay] dt + (B1b1 + b2b2 — ay) dt 
1 
+ (Bia? + 2818B2p0102 + 6202) 2 VY dWa 
1 
= Y dt + (B1b, + Baba — ay) dt + (820? + 281B2p0102 + 202)? VY dW. 


From our discussion of the CIR process, we recall that Y will stay strictly positive provided that: 
Assumption 2: Y(0) = 6,X1(0) + B2X2(0) + a > 0, 


and 
Assumption 3: B1b1 + Baba — yQ > $(Bio7 + 2P1B2p0102 + B202). 


Under Assumptions 1,2, and 3, 
Y(t)>0, 0<t< œ, almost surely, 
and (SDE1) and (SDE2) make sense. These can be rewritten as 


dXı(t) = (a Xy (t) ae ar X2(t) by) dt Oi Y (t) dW (t), (SDE1’) 


dX2(t) = (a21 Xy (t) ae 99X9 (t) b2) dt o24/Y (t) dW3(t). (SDE2”) 
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32.2 Zero-coupon bond prices 


The value at time t < T of a zero-coupon bond paying $1 at time T is 


T 
B(t,T) = E eso (-/ X4(u) da] Fo] 
t 
Since the pair (X1, X2) of processes is Markov, this is random only through a dependence on 
Xı(t), X2 (t). Since the coefficients in (SDE1) and (SDE2) do not depend on time, the bond price 


depends on ¢ and T only through their difference r = T — t. Thus, there is a function B(21, 72, T) 
of the dummy variables x1, x2 and 7, so that 


B(X1(0), X(t), T-t) = E eso f- i Xi(u) da] Fo] 
The usual tower property argument shows that 
PN {- IN AT du} B(Xi(t), Xa(0,T 8) 
is a martingale. We compute its stochastic differential and set the dt term equal to zero. 
d (exp {- [ a du} B(Xi(t), Xa(8), T — n) 
= exp {- i X1(u) du} [Xi dt + Bz, dX, + Ba, dX2 — B, dt 
+ Bre, dX, dX, + Boye, dX dX2+ $ Boye, dXo dxa 
= exp {- I Xı(u) du} (38 + (a11X1 + a12X2 + 01) Be, + (a21X1 + a22X2 + b2)B,, — Br 


+ $07Y Brie, + p0102Y Boo, + OV Bose dt 


- oi VY Ba, dW, + 02VY Ba, awe 
The partial differential equation for B(x, x2, T) is 


—11B-B,+(a1121+01222+b1)B,, +(09101 +a22£2 +b2) Bo, +L01(B101+8B202+0) Bo, 0, 
+ pojo2(B101 + B282 + a) Briz + 307( 8101 + B222 + a) Beses = 0. (PDE) 


We seek a solution of the form 
B(21,02,7) = exp {—#1C1(T) — 22C2[7) — A(r)}, 
valid for all 7 > 0 and all 21, x2 satisfying 


bızı + Bata +a> 0. (*) 
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We must have 
B(21,22,0)=1, V1, 22 satisfying (*), 

because 7 = 0 corresponds to t = T. This implies the initial conditions 

C1 (0) = C2(0) = A(0) = 0. (IC) 
We want to find C'¡ (7), Co(7), A(T) for r > 0. We have 

B,(21, 2,7) = [—#1C}(r) — 22C3[r) — A’(r)] Ble;, 22,7), 

= Cir) B( 

== CHE] Bl 


T1, 22T), 


T1, 22T), 


(PDE) becomes 
0 = B(z1, £2,7T) 2. + 210 (7) + 2203 (7) + A(T) — (a1121 + a1222 +b1)C1 (7) 


— (21% + 42222 + b2)C (rT) 
Is oi (Bizi P222 a)C?(r) + pojo2(B121 + B222 + 0)C1(7)Ca[r) 


1 
2 
+ L03(P1%1 B22 Cit] 


=4B(01,03,7) | —14C (7) — a11C1(7) — a21C2(r) 
+ 307 BCG (T) + poro» Ci (r)Co(r) + 3036:C3(7)| 
+ z2B(£1, 22,7) [c4 — ay2C'1(T) — a22C2(7) 
+ $01b2C1 (T) + po102b2C1(7)C2(7) + 1038:37) 
a (47) — by C(t) — b2Ca(r) 


+ sojaC} (7) + poj02001 (7 Ca[T) + toñaCito)]| 
We get three equations: 


Ci (T) =1+411C1(7) + a21C2(7) — 307810 (7) — p00 28101 (r)C2(r) — 3078103 (7), 


(1) 
C1(0) = 0; 
Ca(r) = a12C1 (7) + a22C2(7) — 37 P2CG (T) — poro2 BC (r)Calr) — 40362030) 2) 
C2(0) = 0; 
Al(r) = bC (T) + b2C2(7) — LojaCí(r) — poj030C1 (T)C2(T) — LozaCi(r), (3) 
A(0) = 0; 
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We first solve (1) and (2) simultaneously numerically, and then integrate (3) to obtain the function 
A(T). 


32.3 Calibration 


Let To > 0 be given. The value at time ¢ of a bond maturing at time t + To is 
B(X (t), X(t), To) = exp{—X, (t) Ci (To) = Xə (1)C2(70) = A(To)} 


and the yield is 


-Ż log B(X1(0), Xo(t), 70) = + Aoc + Xo(t)Co(70) + Alro)]. 


But we have set up the model so that X(t) is the yield at time ¢ of a bond maturing at time t + To. 
Thus i 
Xo(t) = — [X1 (t)C1 (T0) + X2(t)C2(T0) + A(70)]. 


TO 
This equation must hold for every value of X; (t) and X2(t), which implies that 
Ci(ro) = 0, Calro) = To, A(T) =0. 
We must choose the parameters 
411,412, 61; d21,022,b9; P1, 82,0; 01,P,02; 


so that these three equations are satisfied. 
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Chapter 33 


Change of numéraire 


Consider a Brownian motion driven market model with time horizon 7*. For now, we will have 
one asset, which we call a “stock” even though in applications it will usually be an interest rate 
dependent claim. The price of the stock is modeled by 


dS(t) = r(t) S(t) dt + o(t)S(t) dwt), (0.1) 


where the interest rate process r(t) and the volatility process a(t) are adapted to some filtration 
(F(t); 0 <t < T*}. W is a Brownian motion relative to this filtration, but {F (t); 0 < t < T*} 
may be larger than the filtration generated by W. 


This is not a geometric Brownian motion model. We are particularly interested in the case that the 
interest rate is stochastic, given by a term structure model we have not yet specified. 


We shall work only under the risk-neutral measure, which is reflected by the fact that the mean rate 
of return for the stock is r (t). 


We define the accumulation factor 


a(t) =exp{ f rw du}, 


so that the discounted stock price => is a martingale. Indeed, 


S(t)\ _ S(t) 
d (o) = Fay awn. 


The zero-coupon bond prices are given by 


B(t,T)=E fes» {- f ro da] Fo] 
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so 


B(t,T) | 1 | 
—— = E |—|F tt 
a” lal 
is also a martingale (tower property). 


The 7'-forward price F(t, T) of the stock is the price set at time t for delivery of one share of stock 
at time T with payment at time T. The value of the forward contract at time ¢ is zero, so 


0=E E (S(T) - F(t, T) 70) 
_ ACA LO 
= BOE | T rel F(t, T)E E Fo) 
an _ 
= BO zp - F(E T)B(, T) 
= S(t) — F(t, T)B(t, T) 
Therefore, 
POD = Fa 


Definition 33.1 (Numéraire) Any asset in the model whose price is always strictly positive can be 
taken as the numéraire. We then denominate all other assets in units of this numéraire. 


Example 33.1 (Money market as numéraire) The money market could be the numéraire. At time t, the 


stock is worth a units of money market and the T'-maturity bond is worth EZ units of money market. 


S 
8 
a 


Example 33.2 (Bond as numéraire) The 7'-maturity bond could be the numéraire. At time t < T, the stock 
is worth F (t, T) units of 7-maturity bond and the T-maturity bond is worth 1 unit. a 


We will say that a probability measure IPy is risk-neutral for the numéraire N if every asset price, 
divided by NV, is a martingale under /Pyy. The original probability measure JP is risk-neutral for the 
numéraire 8 (Example 33.1). 


Theorem 0.71 Let N be a numéraire, i.e., the price process for some asset whose price is always 
strictly positive. Then IPy defined by 


Py(A) = o FH dP, YAE F(T"), 


is risk-neutral for N. 
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Note: IP and IPN are equivalent, i.e., have the same probability zero sets, and 


P(A) = N(0) J E + dPy, VAC F(T"). 


Proof: Because N is the price process for some asset, WN/ is a martingale under JP. Therefore, 


1 N(T* 
Pr) == lo na 
2 MEY 
N(0) B(T*) 
1 N(0) 
= 50 50) 
=1, 


and we see that [Py is a probability measure. 


Let Y be an asset price. Under IP, Y/(3 is a martingale. We must show that under IPy, Y/N is 
a martingale. For this, we need to recall how to combine conditional expectations with change of 
measure (Lemma 1.54). If0 < t < T < T* and X is F(T')-measurable, then 


_ AWO [NO 
En [xro] =N rozme] 

E | MS 

= Fae la PO): 
Therefore, 

Y(T) _ BE) p [NT YC) 
er em] = wo" Lam wel 
_ POYO 
N (t) BO) 

_¥@ 

= NO 
which is the martingale property for Y/N under IPy. E 


33.1 Bond price as numéraire 


Fix T € (0, 7*] and let B(t, T) be the numéraire. The risk-neutral measure for this numéraire is 


E: B(T,T) 
P(A = aon | So dP 


1 1 
= 2.7 |, TH YAE FD). 
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Because this bond is not defined after time T, we change the measure only “up to time T”, i.e., 
using AEN oe and only for A € F(T). 


IPr is called the T'-forward measure. Denominated in units of T'-maturity bond, the value of the 
stock is 


FD) = gy 0O<t<T. 


This is a martingale under /P7, and so has a differential of the form 
dF (t,T) =0p(t, T)F(t,T) dWr(t), O<t<T, (1.1) 


i.e., a differential without a dt term. The process {Wr; 0 < t < T} is a Brownian motion under 
IP. We may assume without loss of generality that o p(t, T) > 0. 


We write F(t) rather than F(t, T) from now on. 


33.2 Stock price as numéraire 


Let S(t) be the numéraire. In terms of this numéraire, the stock price is identically 1. The risk- 
neutral measure under this numéraire is 


Ps(A) = 50/71 dP, VAEF(T”). 


Denominated in shares of stock, the value of the T'-maturity bond is 


st) FE 


This is a martingale under /Ps, and so has a differential of the form 


d (75) = y (t,T) (a) dWs(t), (2.1) 


where {W s(t); 0 < t < T*} is a Brownian motion under IPs. We may assume without loss of 
generality that y(t, T) > 0. 


BT) 1 
t 


Theorem 2.72 The volatility y(t, T) in (2.1) is equal to the volatility o p (t, T) in (1.1). In other 
words, (2.1) can be rewritten as 


d (75) EP) (=) dWs(t), 2.1) 
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Proof: Let g(z) = 1/z,sog'(x) = —1/2?, g(x) = 2/2*. Then 


= g'(F(t)) dF) + 39 (FO) dF dF 
= ~—_op(t,T) F(t, T) dWe(t) + <0 (t, T)F?(t, T) de 


FP) FO 
2 70 |-or(t, T) dWr(t) + 0%(t,T) ai] 


Sort T) (=) [-dWr(t) + or (t,T) de. 


Under /P7, — Wr is a Brownian motion. Under this measure, FO has volatility o p (t, T) and mean 


rate of return oĉ (t, T). The change of measure from IPr to IPs makes T a martingale, i.e., it 
changes the mean return to zero, but the change of measure does not affect the volatility. Therefore, 
y(t, T) in (2.1) must be o p(t, T) and Ws must be 


Ws(t) = —Wr(t) + [ op(u,T) du. 


33.3 Merton option pricing formula 


The price at time zero of a European call is 


V(0) = E ane _ Ky | 
s ; 
= IE uses) -KE [arter] 
= S(T) ae 1 
A 30D F EOD feno HOT IAT © 
= $(0)Ps{S(T) > K} - KB(0,T)Pr{ S(T) > K} 
= $(0)Ps{ F(T) > K} — KB(0,T)Pr{ F(T) > K} 
= $(0) Ps (ra < x} - KB(0,T)Pr{F(T) > K}. 
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This is a completely general formula which permits computation as soon as we specify o p (t, T). If 
we assume that o p (t, T) is a constant o p, we have the following: 


rig MA oot} 
Ps (a < x) = Ps (orWs(1) — 30FT < log en 


ES 1 S(O) ,1 
A < aR man HOYT] 


where 


Similarly, 


F(T) = 


Pr{ F(T) > K} = Pr {orWr (T) — $07T > log A | 
) 


(> san fet 042) 
MECA 


= {HH <p ei) 


where 


If r is constant, then B(0, T) =e7"?, 


p1= S/T 
ee! 5(0) 12 | 


and we have the usual Black-Scholes formula. When r is not constant, we still have the explicit 
formula 


V (0) = S(0)N (p1) — KB(0,T)N (p2). 
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As this formula suggests, if ø p is constant, then for 0 < t < T, the value of a European call expiring 
at time T is 
V(t) = SON (pitt) — KBE, TIN (pa(t)), 
where 
1 F(t) 
y = — lo: — + Ler -t | 
pı (t) a o K +30F( ) > 
1 F(t) 
t = —_ |log — — io? r-o]. 
pal ) EEPE os K 30 rl ) 
This formula also suggests a hedge: at each time t, hold N(p1(t)) shares of stock and short 
KN (p2(t)) bonds. 


We want to verify that this hedge is self-financing. Suppose we begin with $ V(0) and at each time 
t hold N (p1(t)) shares of stock. We short bonds as necessary to finance this. Will the position in 
the bond always be — K N (p2(t))? If so, the value of the portfolio will always be 


S()N(pi(t)) — KBG, TIN (po) = VO), 
and we will have a hedge. 


Mathematically, this question takes the following form. Let 


A(t) = N(prtt)). 
At time t, hold A(t) shares of stock. If X (t) is the value of the portfolio at time t, then X (t) — 


A(t)S(t) will be invested in the bond, so the number of bonds owned is Aes (t) and the 
portfolio value evolves according to 


X(t) - AM 


dX(t) = A(t) dS(t) + S(t) dB(t, T). (3.1) 


The value of the option evolves according to 
dV (t) = N (p1(¢)) dS (t) + S(t) dN (p1(t)) + dS (t) dN (pr(t) 
— KN (p2(t)) dB(t, T) — K dB(t, T) dN (p2(t)) — KB(t, T) dN (pa(t)). 68.2) 
If X (0) = V(0), will X(t) = V(t) forO <t< T? 
Formulas (3.1) and (3.2) are difficult to compare, so we simplify them by a change of numéraire. 


This change is justified by the following theorem. 


Theorem 3.73 Changes of numéraire affect portfolio values in the way you would expect. 


Proof: Suppose we have a model with k assets with prices 51,53,... ,Sz. At each time t, hold 
A; (t) shares of asset i, i = 1,2,...,k — 1, and invest the remaining wealth in asset k. Begin with 
a nonrandom initial wealth X (0), and let X (t) be the value of the portfolio at time t. The number 
of shares of asset k held at time t is 
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and X evolves according to the equation 


k-1 k-1 dS, 
X= yy A; dS; + (x- 4) om 
k 


i=l 


k 
= 5A; dS;. 
t=1 
Note that 
k 
t) = Y Ai(1)Si(1) 
i=1 
and we only get to specify Ay,... , Az—1, not Az, in advance. 


Let N be a numéraire, and define 


Then 
a aad 


Es i+ (Lois) (= )+ SA dsd (= ) 


i=l 


= (= dS; + Sid (=) feed (x) 


Now 


(x Nal A;S;) 
Sk 
(X/N - DET A:S: /N) 
S,/N 
— x = Sa A:S; 
AA 


Therefore, 


SA dS; S+ (8 Ya) S a, 


t=l 
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This is the formula for the evolution of a portfolio which holds A; shares of asset i, i = 1,2,...,k—- 
1, and all assets and the portfolio are denominated in units of NV. E 


We return to the European call hedging problem (comparison of (3.1) and (3.2)), but we now use 
the zero-coupon bond as numéraire. We still hold A(t) = N (p1(t)) shares of stock at each time t. 
In terms of the new numéraire, the asset values are 


Stock: lu, = F(t), 


Bond: =i, 
¡7 aD) 
The portfolio value evolves according to 
A a d(1) , 
dX (t) = A(t) dF(t) + (X(t) — AY) = A(t) dF(t). 6.1 


In the new numéraire, the option value formula 
V(t) = Mpr(0)5(t) — KBE, T)N (pa(t) 


becomes 


and 


dV = N(pi(t)) dF(t) + F(t) dN (p(t) + dN (pi(0) dF) — K AN (palo). 
(3.2’) 


To show that the hedge works, we must show that 
F(t) dN (px(t)) + dN (pi(t)) dE(0) — K dN (pa(0)) = 0. 


This is a homework problem. 
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Chapter 34 


Brace-Gatarek-Musiela model 


34.1 Review of HJM under risk-neutral /P 


F(t, T) = Forward rate at time t for borrowing at time T. 
df(t,T) = o(t,Djo*(t,T) dt+o(t,T) dW (t), 
where 
T 
o*(t, T) = f a(t, u) du 
t 


The interest rate is r(t) = f(t, t). The bond prices 


BUST = JE lesot- f ro da] rw) 
= eof- [10,0 da] 


dB(t,T) = r(t) B(t,T) dt — o*(t, T) B(t,T) dW(t). 


a ra! 
volatility of T -maturity bond. 


satisfy 


To implement HJM, you specify a function 
alt, T), O<t<T. 
A simple choice we would like to use is 
olt, T) =0f(t, T) 


where o > 0 is the constant “volatility of the forward rate”. This is not possible because it leads to 


Ot, E of rin du, 


t 


df(t,T) = 0 f(t,T) (f rew in) dt + of(t,T) dW(t), 
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and Heath, Jarrow and Morton show that solutions to this equation explode before 7’. 


The problem with the above equation is that the dt term grows like the square of the forward rate. 
To see what problem this causes, consider the similar deterministic ordinary differential equation 


FO =PO, 
where f (0) = c > 0. We have 


OE 


1 1 t 
-tm =f b=! 
O 1 ct — 1 


eee ny 
J= =, 


— 
fO fO) 


This solution explodes at t = 1/c. 


34.2 Brace-Gatarek-Musiela model 


New variables: 


Current time t 


Time to maturity 7 = T — t. 
Forward rates: 
r(t,r)=f(t,t+7), r(t,0)= f(t,t) = r(t), 
ð ð 
arre T) = api tts t+ T) 


Bond prices: 


D(t, T) = B(t,t+7) 


= exp [7 f(t,v) do) 


(u = v — t; du = dv): =expf- [ st+0) du} 


= exp {- f r(t, u) du} 


OT 


9, P(t,7) = ae eae T) = —r(t, Oise): 


(2.1) 


(2.2) 


(2.3) 


(2.4) 
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We will now write o (t, 7) = o(t, T — t) rather than o (t, T). In this notation, the HJM model is 


df(t,T) = o(t,r)o"(t,7) dt + o(t, 7) dW(0, (2.5) 
dB(t,T) = r(t)B(t,T) dt — o*(t, 7) B(t, T) dW (8), (2.6) 
where 
o*(t,7) = f ida (2.7) 
Zot, E (2.8) 


We now derive the differentials of r (t, r) and D(t, 7), analogous to (2.5) and (2.6) We have 


dr(t, T) = df (t,t +7) + 2 (t,t+ T) dt 


differential applies only to first argument 


CD oft, mor (t, 7) dt + olt, r) dW (t) + Zrt, T) dt 
T 


== [re r) + lot, r))?] dt + olt, r) aw). (2.9) 
Also, 


dD(t,7) = dB(t,t+7) + Bl +7) dt 


differential applies only to first argument 
(2.6),(2.4) * 
=" r(t) B(t,t +7) dt — o* (t, Tr) B(t,t +7) dW(t) — r(t, 7) D&E, 7) dt 
= [r(t,0) — r(t,7)] D(t, T) dt — o* (t, 7) D(t, T) dW (t). (2.10) 


34.3 LIBOR 


Fix ô > 0 (say, 5 = 4 year). $ D(t, 5) invested at time t in a (t + 5)-maturity bond grows to $ 1 at 
time t + ô. L(t, 0) is defined to be the corresponding rate of simple interest: 


D(t, 6)(1+ 5L(t, 0) =1, 


1 9 
1+ 6L(t,0) = DES -of a r(t, u) da), 
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34.4 Forward LIBOR 


9 > 0 is still fixed. At time £, agree to invest $ pl ts) at time t + 7, with payback of $1 at time 
t +7 +0. Can do this at time t by shorting Plats ZE y bonds maturing at time ¢ + 7 and going long 
one bond maturing at time t + 7 + 6. The value of this portfolio at time ¢ is 
D(t,r +8) 
—-————_ Dit D(t ô) =0. 
D(t, 7) (t, T) + (t, T + ) 
The forward LIBOR L(t, 7) is defined to be the simple (forward) interest rate for this investment: 
D(t,7 +06) 
——__§_ (1+6L(t = 
D(t, T) ( F ( 7 T)) ? 
D(t — fo r(t,u) d 
1+6L(t,r) = (t, 7) = exp { So r(t, u) u} 


D(t, T + ô) exp Jo r(t, u) du} 


= exp ve r(t,u) a i 


exp AA u) du) —1 


L(t,7)= 5 (4.1) 
Connection with forward rates: 
ð T+ó T+ô 
— exp f r(t, u) du = r(t, T + ô) exp f r(t, u) du 
do T $=0 T $=0 
=r(t,T), 
so 
T+ 
exp4 [7t r(t, u) dup —1 
f(t, t+ T) = r(t, T) = lim exp {Ip r(t u) du} -1 
¿Lo ô 
T+ 
exp, 2 r(t,u) dup —1 
eaa aS : ) , 9>0 fixed. 
(4.2) 


r(t, T) is the continuously compounded rate. L(t, 7) is the simple rate over a period of duration ô. 


We cannot have a log-normal model for r (t, 7) because solutions explode as we saw in Section 34.1. 
For fixed positive ô, we can have a log-normal model for L(t, 7). 


34.5 The dynamics of L(t, 7) 


We want to choose o(t,7), t > 0, 7 > 0, appearing in (2.5) so that 
dL(t, T) = (...) dt+ L(t, T) y(t, T) dW (t) 
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for some y(t,7), t > 0,7 > 0. This is the BGM model, and is a subclass of HJM models, 


corresponding to particular choices of o(t, 7). 


Recall (2.9): 


(o*(t,u))?] dt + a(t, u) dW (t). 


a 
3 
a oe 
+ 
~ 
—* 
II 
x 
PA 
e 
2 
XX 
+ 
bole 


= [T È fem t Hort y] dara f" ot auawe 


= [r,r + ô) — r, T) +5(0(t,7 +0)? - H(0*(t,1))] de 
+ [o*(t,7 +0) — o*(t,7)] dW (t) 


and 


T+ô T+ô 2 
+ Rod | i r(t, u) da] gı E r(t,u) in) 


SN rll + 6L(t,7)] x 
x fre 7+6)—r(t,6)+ $(o*(t, T+ 6))? = 3(0*(t, ral dt 
+ [o*(t,7 +0) — o*(t,7)] dW (t) 


+ to (t, r+ 8) — oP ae) 


= [1 +0£(t, "irte, T +0) — r(t,9)] dt 
+o*(t,7 +0)[0*(t,7 + 6) — o*(t,7)] dt 


= +[o*(t,7 +0) — o*(€, 7)] ar y. 


œl = 


(5.1) 


(5.2) 


But 
a a exp SIP r(t, u) dub — 1 
art (t 7) = Fr A 
T+ô 
zie if it da] O SHES) 
= FIL + SLE, DiE, r+ 8) —r(t,6)) 
Therefore, 


dL(t, T) = ou, T) dt + sl + 6L(t, 7) llo*(t,7 +0) — o*(t,7)]-[o*(t, 7 + 6) dt + dW (t)]. 
T 
Take y (t, T) to be given by 


y(t, 7) L(t, 7) ==[11+0L£(t, 7) llo*(t, 7 + 0) — o*(t, 7). (5.3) 


| 


Then 


dL(t, T) = [Le T) + y(t, 7) L(t, r)o* (t,7 + 5)] dt + y(t, 7) L(t, T) dW (t). a 
A) 


Note that (5.3) is equivalent to 


SL (t, T)y(t, T) 


o*(t,7 +ô) =o*(t,7) + 1+ 6L(t,7) l 


(5.3’) 


Plugging this into (5.4) yields 
SL? (t, rt, 7) 
1+ 6L(t,7) 
+ y(t, r)L(t,7) dW(t). 6.4’) 


dL(t, T) = Lt, T) + y(t, rT) L(t, r)o*(t, 7) + 


34.6 Implementation of BGM 
Obtain the initial forward LIBOR curve 
L(0,7), 7>0, 
from market data. Choose a forward LIBOR volatility function (usually nonrandom) 


y(t, T), t20,7F > 0. 
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Because LIBOR gives no rate information on time periods smaller than ô, we must also choose a 
partial bond volatility function 


oUF). EA Oke <6 


for maturities less than 6 from the current time variable t. 


With these functions, we can for each 7 € [0, 4) solve (5.4’) to obtain 
L(t, T), £20; 057 <0. 

Plugging the solution into (5.3’), we obtain o*(t, 7) for 4 < r < 26. We then solve (5.4’) to obtain 
ESP. EOS DO, 

and we continue recursivel y. 


Remark 34.1 BGM is a special case of HJM with HJM’s 0*(t, 7) generated recursively by (5.3”). 
In BGM, y(t, 7) is usually taken to be nonrandom; the resulting o*(t, 7) is random. 


Remark 34.2 (5.4) (equivalently, (5.4”)) is a stochastic partial differential equation because of the 
2 L(t, T) term. This is not as terrible as it first appears. Returning to the HJM variables t and T, 
set 
K(t,T) = Lt, T-t). 
Then 
dk (t,T) = dL(t,T — t) - ii T — t) dt 


and (5.4) and (5.4”) become 


dK (t, T) = y(t, T -0K(t,T) [o* (t, T —t + 6) dt +dW(8)] 
5K (t,T) y(t, T-t) 


=y, T—tK(t,T *(t, T —t) dt 


dt + dW(t)|. 
(6.1) 


Remark 34.3 From (5.3) we have 


tr) HS 
If we let 60, then 


mre A =rhs, 
oô 5=0 


and so 
y(t,T -t)K(t, D)>oo(t, T —t). 


We saw before (eq. 4.2) that as 60, 


L(t, r)r(t,7) = f(t,t+7), 
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so 
K(t,T)3f(t,7). 


Therefore, the limit as 9/0 of (6.1) is given by equation (2.5): 
df(t, T) = o(t, T — t) [o*(t, T — t) dt + dW(t)]. 


Remark 34.4 Although the dt term in (6.1) has the term AE 


to this equation do not explode because 


involving K?, solutions 


ôy’ (t, T —t)K?*(t, T) E oy? (t, T — t) K7(t, T) 
148K T) 7 ÔK (t, T) 
<P, T- HKE T). 


34.7 Bond prices 


Let 8 (t) = exp ne r(u) du} . From (2.6) we have 


BGT: hice 
d (A> ) = FOL (t)B(t, T) dt + dB(t, T)] 
3 BE) apes. 

O (t, T — t) dW(t). 


PUD to this stochastic differential equation is given by 


The solution 3 


> = exp {- i, o*(u, T — u) dW (u) — 2 eT — u))? au} : 


This is a martingale, and we can use it to switch to the forward measure 


1 1 
P(4)= aon an E 
B(T, T) 


=e WT) BOT) dP VWAE F(T). 


Girsanov’s Theorem implies that 


is a Brownian motion under /Pr. 
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34.8 Forward LIBOR under more forward measure 


From (6.1) we have 


dK (t,T) = y(t, T — t)K (t, T) [o> (t,T — t + 8) dt + dW(t) 
= 7(t, T — K (t, T) dWrys(t), 


so 
t t 
K(t,T) = K(0,T) exp{ [ y(u, T — u) dWris(u) — if y (u, T — u) du} 
0 0 
and 


K(T,T)= K(0,T) SE — u) dWr45(u) — Af Yu, T — u) de] 


(8.1) 
T T 
= K(t, T) exp f ylu, T — u) dWris(u) — if y (u, T — u) du 
t t 
We assume that y is nonrandom. Then 
T T 
X(t) = f y(u, T — u) dWris(u) — if y (u, T — u) du (8.2) 
t t 


is normal with variance a 
PA= f P,P =u) du 
t 


and mean —3p?(t). 


34.9 Pricing an interest rate caplet 


Consider a floating rate interest payment settled in arrears. At time T + ô, the floating rate interest 
payment due is 6L(7,0) = SK (T,T), the LIBOR at time T. A caplet protects its owner by 
requiring him to pay only the cap dcif9K(T', T) > dc. Thus, the value of the caplet at time T + 6 
is d(C (7,7) — c)*. We determine its value at times 0 < t < T + ô. 


CaseI:T <t<T+0. 


Crys(t) = E grag EED at (| 0.1) 
-= (K(T,T)- ot E rs Ton Fe | 


= 5(K(T,T) — )* Bt, T +6). 
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Case Il: 0 <i <7. 
Recall that 


Pre [20 +5) dP, VACF(T +S), 


where 
B(t,T +09) 
70 = OB TS) 
We have 
Cras(t) = E IKC, Pies F| 
_ BBO, T +ô) B(T +6,T +09) - 
= 6B(t,T + 0) BETS) WF+OBOTLH (T,T)- A) tF (t) 
1 Z(T+6) 


z0 


= B (t, T + drys |(K(T,T) —c)t F(0) 


From (8.1) and (8.2) we have 
K(T,T) = K(t,T) exp{X (0), 


where X(t) is normal under Pr+5 with variance p?(t) = ff y?(u,T — u) du and mean —4p?(t). 
Furthermore, X (t) is independent of F(t). 


Crist) = 8B T +8) Erys [KE T ex) - 9 FO). 
Set 
a(y) = Ers [(yexp£X ()} -0)*] 
97 (alee 24 0) eN (ales? 7 0) 
Then 


Crest) =ô BE, T +0) g(K(t,T)), 0<t<T- 8. 0.2) 


In the case of constant y, we have 
p(t) =yVT —t, 
and (9.2) is called the Black caplet formula. 
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34.10 Pricing an interest rate cap 


Let 


To 0, Ti ô, To 26, seg Es = nó. 


A cap is a series of payments 
(K (Tpk, Ti) — c)" attime Tri, k = 0,1,... n — 1. 
The value at time t of the cap is the value of all remaining caplets, i.e., 


C= Y, Cr. 


kt<T; 


34.11 Calibration of BGM 


The interest rate caplet c on L(0, T) at time T + 6 has time-zero value 
Cr+4s(0) = ¿B(0, T+ ò) g(K (0, T)), 


where g (defined in the last section) depends on 


T 
f y (u, T — u) du. 
0 


Let us suppose y is a deterministic function of its second argument, 1.e., 


yt, T) = (7). 
Then g depends on 


[re- u) du = [re dv. 


If we know the caplet price C’r45(0), we can “back out” the squared volatility i. /2(v) dv. If we 
know caplet prices 


Cro+s (0), Cras (0), ...>) Cr,,+5(0), 


where To < Ti <... < Th, we can “back out” 


r y (v) do, a y’ (v) dv = a 2(w) dv — [ id 


To 0 


In this case, we may assume that y is constant on each of the intervals 


(0, To), (To, Ti), AS ¡ESP Tr) 
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and choose these constants to make the above integrals have the values implied by the caplet prices. 


If we know caplet prices C745(0) for all T > 0, we can “back out” m y? (vw) dv and then differen- 
tiate to discover y?(7) and y(T) = \/72(r) for all 7 > 0. 


To implement BGM, we need both y(7), 7 > 0, and 
oe), t>0,0<Tr<ð. 


Now o*(t,7) is the volatility at time t of a zero coupon bond maturing at time t + 7 (see (2.6)). 
Since 6 is small (say + year), and 0 < 7 < 6, it is reasonable to set 


ir) S00; 1250. 6<r.<0: 
We can now solve (or simulate) to get 
L(t,T), t2>0,7 20, 


or equivalently, 
K(t, T), t20;TS 0, 


using the recursive procedure outlined at the start of Section 34.6. 


34.12 Long rates 


The long rate is determined by long maturity bond prices. Let n be a large fixed positive integer, so 
that nd is 20 or 30 years. Then 


where the last equality follows from (4.1). The long rate is 


1 


1 ee 


34.13 Pricing a swap 


Let To > 0 be given, and set 


Tı = To + ô, T> = To + 26, ...> Tn = To + nó. 


CHAPTER 34. Brace-Gatarek-Musiela model 347 


The swap is the series of payments 
S(L(Tx,0)—c) attime T,41,4 = 0,1,... n — 1. 
For 0 < t < To, the value of the swap is 


Sp LO 
26 Lay ET) —c) 0) 
Now 
nos TA 
1 1 
L (Tk, 0) = 5 a e i l 
We compute 
B(t) 
Am E 40) 
= Oa eee? 
re Eam (Tr 1-6 ) Fo) 
el 9 p| 8) A : 
sl P(Tk)B(Tk, Te+1) Su Era F(t) Fo (1 + de) B(t, Tk+1) 
B(Tk;Tk41) 
_ p| 2O Eo 
= E [aeg FO] 0B T 


= pü Tx) — O FJBG, Tkr). 


The value of the swap at time t is 


n—1 a(t) 
> s Em 8(L (Tk, 0) — e) 


n 


F| 


| 
= 


[B(t, Tk) — (1 + de) B(t, Tk+1)] 


é Dyas AE BET) EE EE BE ay E E) 
(To) -SBU T) = BBD =o. SEB = BUD). 


The forward swap rate wy, (t) at time t for maturity To is the value of c which makes the time-t 
value of the swap equal to zero: 


k 


B 
B 


7 B(t, To) — B(t, Tn) 
US Satan Ba 


In contrast to the cap formula, which depends on the term structure model and requires estimation 
of y, the swap formula is generic. 


